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Preface 


Uncertainty quantification is both a new field and one that is as old as the 
disciplines of probability and statistics. The present novelty lies in the synthesis 
of probability, statistics, model development , mathematical and numerical analysis, 
large-scale simulations, experiments, and disciplinary sciences to provide a compu- 
tational framework for quantifying input and response uncertainties in a manner 
that facilitates predictions with quantified and reduced uncertainty. This is the 
topic of this book. 

Uncertainty quantifier lion for physical models can be motivated in the context 
of weather modeling. Models for complex phenomena, such as dust-induced cloud 
formation* are approximate and uncertain, as are the parameters in these models. 
Additional errors and uncertainties are introduced by the numerical algorithms and 
experimental data used to approximate and calibrate the models. In the first step 
of the prediction process, data assimilation or model calibration techniques arc used 
to determine input parameters and initial conditions so that quantities of interest, 
such as temperature or relative humidity, match current conditions. The second step 
i -i 1 1 y- il- ill.- | > i ■ 1 i ■ ■ i i : :i : i < : t. n i:iv v. v.il hr -a i: li ■ ;■ ms wiili micciu iihil ios qu;i m i:'lcd by 
probabilistic statements — e.g., 95% change of rain — or uncertainty cones of the type 
reported for hurricanes or tropical storms. 

Whereas model calibration and uncertainty propagation comprise the primary 
aspects of the prediction process* their implementation for large-scale h police t.ions 
requires a wide range of supporting topics. These include aspects of probability, 

:-i ! I i--: b. mmlysis. mid mi; .r.ir.h - is ^ vc -T ’ :i-.- TT viny '■.■pics: pa rum 

oter selection, surrogate model construction, local and global sensitivity analysis, 
and quantification of model discrepancies. The interdisciplinary nature of the field 
is augmented by the fact that all of these components must be investigated and 
implemented in the context of the underlying applications. 

The explosive growth of uncertainty quantification as an interdisciplinary field 
is due l.o a number of factors; increasing emphasis on models having quantified 
uncertainties fur large-scale applications, novel algorithm development, and new 
computational architectures that facilitate implementation of these algorithms. 

In Chapter 2^ we detail five applications where model predictions with quanti- 
fied uncertainties are critical for understanding and predicting scientific phenomena 
and making informed decisions and designs based on these predictions. These ap- 
plications are weather models, climate models, subsurface hydrology and geology 
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models, nuclear reactor models, and models for biological phenomena. Whereas the 
presence ami role of uncertainties in these applications Inis long been recognized, 
the development of computational models that quantify and incorporate uncertain- 
ties is receiving increased attention. The reliance of scientists and policy makers on 
such models is expected to grow rapidly as the field of uncertainty quantification 
for predictive sciences matures and computational resources evolve. 

The relatively recent development, of supporting mathematical and statistical 
theory anti algorithms is a second factor supporting the growth of the field. For 
example, the adaptive DRAM and DREAM algorithms discussed in Chapter 8 for 
Bayesian model calibration were developed within the last ten years. These algo- 
rithms arc presently being investigated in the context of climate and groundwater 
models. Similarly, much of the sparse grid theory discussed in Chapter 11 was 
■ Wohjjn-:] ill i 111 lust wt 1 1 1 v y coil's, dthungb 1 1 : ■ fjngiiial n n loop I is unuli nKfev. 

The; availability of massively parallel computer architectures ami hardware 
has further bolstered uncertainty quantification for complex and large-scale appli- 
cations. The DREAM <ilgorithms arc inherently parallel, and recent versions of 
DRAM are being implemented on parallel architectures. It is anticipated that field 
programmable gate arrays (FPGAs) will be increasingly utilised for uncertainty 
quantification aw high-level tools arc developed to reduce programming overhead. 
The fact that we operate in increasingly data-rioh environments will also benefit 
uncertainty quantification, and we anticipate increased interaction between data 
mining, high-dimensional visualization, and uncertainty quantification. 

The growth in the held has spawned the introduction of interdisciplinary 
courses on uncertainty quantification, and this text owes its genesis to the authors 
development of such a course at North Carolina State University in This text 

was written with the goal of introducing advanced undergraduates, graduate stu- 
dents, postdocs, and researchers in mathematics, statistics, engineering, and natural 
and biological sciences to the various topics comprising uncertainty quantification 
for predictive models. To achieve this, we motivate » number of the topics using 
very basic examples that should be familiar to most readers. We have included 
numerous definitions and significant detail to provide a common footing for a wide 
range of readers. Because this is a new and evolving field, we indicate open research 
questions at various points in the text and provide research references in the Notes 
and References at the end of each chapter 

Various resources will be maintained at the website http://www.aiam.org/ 
books /cs 12 to augment the text and provide a mechanism to update the material. 
This includes data employed in exercises as well as a future erratum. 

This text has benefited significantly from graduate students, postdocs, and 
colleagues whose comments have improved the exposition and reduced the num- 
ber of typos by orders of magnitude. Specifically, sincere thanks are extended to 
Nate Burch, Amanda Coot is, John Crews, John Harlim, Zhengzheng Hu, Dustin 
Kapraun, Zack Kenz* Christine batten. Jerry McMahan J r . , Keri Rehm, Island 
Wentworth, and Lucus Van Blair cum for their attention to detail and candid feed- 
back regarding parts of t lie manuscript:. The author is also extremely grateful to 
Brian Adams and Karen Willcox for their feedback during the review process; the 
book in significantly improved due to their detailed comments. 
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Notation 


This compilation does not include all of the symbols used throughout the text, 
and we neglect those that appear one time in a specific context such as those in the 
models of Chapter 2. Instead, it. is meant t.o clarify the role of symbols that appear 
multiple times throughout the discussion. 


Symbol 

Meaning 

Page 

&D,dQ. 

Boundaries of regions V and ft 

62, 63 


Probability of accepting candidate r/" 

159 

li 

Normalization factor for {-, ■) 

20 G 

r*, r 

Range of i th random parameter* random vector 

108 

5, S(ti) 

Model discrepancy or error 

1 33, 257 

£, € 

Random and realized measurement errors 

32, 132 

A Af, 

I jeb [ ?sgi it 1 t 'ot isi a] U 

252 

ft 

Mean 

70 

Vi-.K 

\ l orr is sens i t i vi ty n i easn res 

332 

1/ 

Dimension of model response tj{t. 17 ) 

61 

7T 0 (g), 7r((jf|y) 

Bayesian prior and posterior density 

100 

x{lM) 

Bayesian likelihood function 

ion 

pQi { m ), Pv(q) 

Density for i th random parameter, random vector 

108 


Unknown measurement error variance 

135 

a 2 1f (T 2 

Estimator arid estimate For 0 ^ 

1 35 

°3 

Singular values of the matrix A 

117 

£ 

\ [ h tr i x 0 f sing 1 1 1 ay val 1 ies of j 1 1 at ri x A 

1 17 

X) 

Realized measurements of T 

132, 156 

T 

Random variable for measurements 

82 

<?,(» 

Spatial basis functions 

219 

X 

Independent variables x - [ x -. f € T> x T £i 

63 

VkiQh $k(Q) 

Uni varis to. multi variate orthogonal polynomials 

209. 213 

A{q,p) 

Sparse grid quadrature operator 

247 

B(u,q) > B(q)u 

Boundary operators 

62, 63 
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Symbol 

Meaning 

Page 

C 

Observation matrix or vector 

61 


Morris elementary effects for V h input 

331 

D. Di , Dij 

To ttiil and partial variances of response Y 

324 

V 

Spatial domain in JR 1 . IR \ or R A 

62 


Model response 

132 

/(<?},/(*,*,<?) 

Surrogate model 

274 


Source terms 

62 

//„, //,, , //f 

Hilbert spaces for state, parameters, and source 

63 

//( (Q) 

Hen n 1 te p 0 L y 1 10 m \ als 

210 


Sparse quadrature grid 

248 

i'j'.k' 

Multi-indices 

212 

/<">/ 

Integral operator in li£ /J 

240 

[=J 

1 ^ 

Multi-index sets 

216 

4 rt « 

Interpolation operator in E ?i 

2.54 

I(q) 

Identifiable subspace 

113 

Z(<i) 

Space of influential parameters 

114 

J{q) 

Least squares functional 

136 


Proposal or jumping distribution 

159 

n<i\v) 

Log-I a kd i I iood fui iction 

83 

L{q\v) 

Likelihood function 

83 


Linear operator 

63 

h i-.'i (/i J 

Lagrange interpolating polynomial 

261 


Square integrable functions on P^P 

216 

M, A/; 

Number of collocation points or samples 

253 

Tl 

Number of measurements or model evaluations 

61 

N 

Dimension of state u 

61 

Nm 

U 1 ] identifial de sui >aci s 

113 

xUn) 

Space of non influential parameters 

J 1 4 

A' (A) 

Null space of the matrix A 

116 

f/) 

Linear or nonlinear differential operator 

62 

P 

Number of parameters 

100 

Pi{£?) 

Legem Ire pol y not 1 ii als 

211 

P* 

Space of polynomials with argument less 
than or equal to k 

208 

P* 

Polynomials in II 11 ;, that are orthogonal t,o JP^_i 

208 
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Syrnl >ol 

Meaning 

l-’age 


ft) 

lYue but unknown parameter 

82 

<! = [tfi f h>} 

Realizations of Q 

100 


Proposed Markov chain parameter 

150 

q*- 1 

Parameter at k — 1 step in Markov chain 

159 

q'\<r 

Quadrature, collocation, and sample points 

21 L, 217 

r /i :■ (. s > Qol$ 

Least squares estimator, estimate for jJq 

82 

f Ju a r 

Maximum a posteriori estimate 

i r> 7 

LI i' 

Maximum likelihood estimate 

8-1 

Q [Qij - - ■ )Q P ] 

Random vector of parameters 

inn 

Q 

Orthogonal matrix in QR factorization 

118 

Q 

Admissible parameter space 

82 

o 

Sample space 

82 

q(p) 

Quadrature operator in E ,:J 

240 

r 

Rank of matrix A 

117 

jt, Si 

Number of quadrature points 

243 

/? 

Upper triangular matrix in QR factorisation 

118 

/t 1 

Residual estimator and estimate 

136 

n 

Number of sparse grid quadrature points 

248 

7?.(u, r/) 

General observation or response 

f>3 

7t(A) 

Range of the matrix A 

116 


Local sensitivity indices 

192, 322 

5 f 

Sigma-normalized sensitivity indices 

322 

Si.,-, S T , 

Sobol sensitivity indices 

324 

ss v 

Suii] of squares error 

136 

T 

Temporal domain 

63 

u{q). u(t,x<q) 

State variable 

61 

u[l.,. f:,<j ) 

Surrogate state represent^ ion 

279 

v, V J 

Spaces of spatial test functions 

219 

V k 

Chain covariance matrix 

172 

il.7 J 

Quadrature weights 

21 1 

X 

Deterministic n x p design matrix 

131 

*(<?) 

Sensitive y matrix 

] 44 

V 

Realizations of Y 

132 

Y 

Random variable for model response 

321 

z, z A ' 

Sfiace? of parameter test functions 

21 fl 
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Term 

A leaning 

Page 

A NOVA 

Analysis of variance 

201 

AR 

A i itoregressi ve (mo del ) 

m 

ASAP 

Adjoint sensitivity analysis procedure 

306 

BWR 

Boiling water reactor 

30 

C&dhig 

Continue k droite. limits k gauche 


CASL 

Consortium for Advanced Simulation of Light 

Water Reactors 

3 i 

cdf 

Cumulative distribution fit net ion 

08 

CESM 

Community Earth System Model 

m 

CFCs 

c! dove f] ik ) n ') carl j c ms 

2b 

CRUD 

Chalk River unidentified deposit 

13 

CVTs 

Centroidal Voronoi tesselations 

28b 

DAKOTA 

Design Analysis Kit for Optimization and 

Tera scale Applications 

2M 

DOE 

Department of Energy 

:17 

DRAM 

Delayed rejection adaptive Metropolis 

172 

DREAM 

DiffeRential Evolution Adaptive Metropolis 

181 

ECMWF 

European Centre for Medium- Range Weather Forecasts 

16 

FPGAs 

Field programmable gate arrays 

X 

FSA V 

For ware 1 s* : t ] s i i i vi ty a n a] y s is p ro ced i ire 

306 

gcd 

Greatest common divisor 

94 

GGR 

Gas-cooled reactor 

30 

GP 

Gaussian process 

m 

sPC 

Generalized polynomial chaos 

207 

HUM ft 

Higli-dimensiona] model represent. At ion 

289 

HIV 

Hmnan immunodeficiency virus 

45 

iid 

Independent and identically distributed 

79 
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Intergovernmental Panel on Climate Change 

32 
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Kernel density estimation 

75 
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41 
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50 

MAP 

Maximum a posteriori (estimate) 

157 
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Markov chain Monte Carlo 

1-59 
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84 
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Nonintrusive spectral projection 

225 

NVVI’ 
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15 

ODE 

Ordinary differential equation 

51 

OLS 

Ordinary least squares 

82 

ORAL 

Oak liidge National Laboratory 

41 

PC 

Polynomial chaos 

207 
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Principal component analysis 

109 

PDE 

Partial differential equation 

51 

pdf 

Probability density function 

59 
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Probabilistic risk assessment 

44 
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74 
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55 
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117 
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Chapter 1 

Introduction 


The synthesis of modeling, large-scale simulations, and experiments has long been 
recognised as critical for understanding and advancing the state of science and 
technology. When considered in the broad sense of including requisite theory, these 
form the pillars of predictive science, as illustrated in Figure 1,1, In the context of 
predict! vo science, uncertainty quants fie at ion can he broadly defined as the science 
of identifying, quantifying, and reducing uncertainties associated with models, nu- 
merical algorithms, experiments, and predicted outcomes or quantities of interest . 
Aspects of this field , such as the quantification of measurement uncertainties and 
numerical errors, are well understood and are addressed by classical statistics, and 
numerical analysis theory. However, the systematic quantification of uncertainties 
and errors in models, simulations, and experiments and the analysis of how they 
are propagated through complex models to affect predicted outcomes is more recent 
and constitutes both an active area of research and the subject of this text. 

In Chapter 2, wo detail five large-scale applications where model predictions 
with quantified uncertainties are critical for understanding and predicting scientific 
phenomena and making informed decisions and designs based upon these predic- 
tions. These applications are weather models, climate models, subsurface hydrology 
and geology models, nuclear reactor designs, and models for biological phenomena. 



Figure l.li Modeling, numerical, and experimental components of predictive sci- 
ence mlh ussoriaf.ed uncertainties and r.j-tvrrs indicated in frriiy. 


2 


Chapter 1. introduction 


In the following example, we sum in arize aspects of the weather model detailed 
in Section 2.1 to motivate sources of uncertainty ^nd indicate issues that nmsi be 
addressed when making predictions. 


Example 1.1 (Weather Prediction). 

The physical components of meteorological or weather models are constructed 
by quantifying the it iter act. ions between temperature and pressure gradients. wind* 
and precipitat ion using conservation of energy, mass, and momentum. When com- 
bined with conservation of water phases and aerosol concentrations 3 this yields the 
equations of atmospheric physics, 


dp 

_ V . |» = 0, 

0v 1 

— — v - V'e — — \ p — p k — 2£ i x u, 
at p 

■ v = -V ■ F + V ■ (frvr) + pq{T,p, f>), 

V = pKT, 


( 1 . 1 ) 


thn j 


&t 


v-Vmj I S mj {T,mj,X3,p) , 3 = 1,2,3, 
-v ■ VXj + s x X.i ,P) , 3 1, 1 


where p, v, T,p, k. and <;y respectively denote the density, velocity, temperature, 
pressure, thermal conductivity, and specific heat of air. The concentration of water 
in soi if], liquid, and gaseous phases is denoted by rn 1 ,^ 2 , and nt$, whereas the 
concentration of the j fh aerosol species is denoted by Xj ■ 

For the reasons discussed in Section 2.1, one typically constructs phenomeno- 
logical models for the source terms S m .. (T, p) and S x _. (T, p) ■ hoi example, 

it is established in (2.8) that S nL . 2 can be formulated as 



-I- S 2 4 S > 



( 1 . 2 ) 


where 


S\ - p(m 2 - m u 2 ) 


1.2 x 10" 4 4 1.569 x 10 1J 


Tli” 


- -J 


do (m2 - mX) 


(1.3) 


requires the specification of the nonphysical parameters p, i7i] n n r , and <i). The 
r email ling components have si ini lax fomiulations- 

Modet Errors or Discrepancies* Both the conservation relations (1.1) and phe- 
nomenological closure equations (1.2) and (1,3) are approximations of the true un- 
derlying physics. Furthermore, phenomena such as the conversion of cloud droplets 
to rain drops, quantified by (1.3), occur 011 much smaller scales than the numerical 
grids employed when solving (1-1). The resulting model errors or discrepancies pro- 
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duee biased or systematic uncertainties that arc typically difficult to quantify using 
a probabilistic framework. 

Input Uncertainties, Parameters such as p, m. 2 ,TA r , and do hi the phenomenolog- 
ical representation (1-3) for S\ are uncertain, as are initial conditions For the evo- 
lution equations (1.1). These comprise input uncertainties that are often amenable 
to probabilistic analysis. 

Numerical Errors and Uncertainties, As detailed in Section 2.1.3, local mete- 
orological models are numerically approximated on spatial grids having horizontal 
spacing on the order of 5 km and vertical spacing of approximately 200 m. [’his 
introduces numerical discretization or approximation errors. Furthermore, it intro- 
duces systematic uncertainties due to the fact that parameterized processes ? such 
as aerosoMnduoed cloud formation and atmospheric turbulence, occur on much 
smaller, subgrid, scales. 


Measurement Errors and Uncertainties , It is noted in Section 2.1.4 that me- 
teorological data is comprised of earth-surface and atmospheric measurements. File 
latter is obtained from weather balloons, weather satellites, and aircraft.. There are 
two primary sources of mi certainty: limited accuracy of the sensors and uncertainty 
associated with the time and location of measurements. 


Pi edi i : li a v jus for ltV:cx Pier F r> recas Lx. I J l it :e r t ii i 3. j ty t jUft nt.if i cal i t m for weather 
forecasting takes place in two steps. In the first-, data assimilation — often performed 
in a Bayesian framework — is used to determine values and quantify uncertainties 
for inputs such as initial conditions and phenomenological parameters. This is the 
model calibration step. In the second step, the calibrated models are run forward 
in time to provide forecasts with quantified uncertainties. 

To accommodate the effects of input, model, numerical, and measurement 
uncertainties, ensemble forecasts are computed by running multiple simulations 
from individual or multiple models with differing initial conditions or parameter 
values drawn from probability densities constructed during the calibration phase. 
Using the ensemble predictions, one computes statistical quantities of interest, such 
as the average temperature, relative humidity, or projected rain amounts. 

Although uncertainties associated with quantities of interest are computed 
during the ensemble computations, they typically arc not reported in forecasts. 
One exception is the prediction of large storms such as cyclones, tropical storms, 
or hurricanes. This is illustrated in Figure 1.2 by tike predicted trajectory and 
uncertainty cones for the post- tropical cyclone Deb by. 

The following definitions quantify terms, introduced in Fx ample 1.1, that play 
a fundamental role throughout the text. 


Definition 1,2 (Inputs), The term inputs is used to designate parameters, ini- 
tial conditions, boundary conditions, or exogenous forces that exhibit uncertainties 
which must be determined and propagated through models to construct predic- 
tions with quantified uncertainties. Known or fixed coefficients, independent and 
dependent variables, and control signals do not- constitute inputs as defined here. 




4 


Chapter 1. Introduction 



Figure 1/2, NO A A image of the trajectory and cone of uncertainty for the post- 
ing picul cyclone Debhy. 

Definition 1.3 (Quantity of Interest (Qol)), The Qol designates the output 
of a simulation model or experiment that provides information necessary to make 
wiielusioiiH or decisions about the process. In many context*! wo employ the U?nns 
model response or model output in a synonymous manner, As illustrated in Exam- 
ple 1.1, wc often consider statistical or probabilistic Qol to accommodate uncertain- 
ties intrinsic to the modeled process. Examples of probabilistic Qol include average 
temperatures, expected precipitation in a viewing region, average performance of 
a nuclear reactor, and expected impact of drilling in an environmentally fragile re- 
gion, Chapter 2 details Qol for weather, climate, groundwater, nuclear reactor, and 
systems biology models. 


Em Einit iou 1.4 { Vo r i hratiem). Vcrifi cation refers to the process o f t p iat it. i fy i ng the 
accuracy of simulation codes used to implement mathematical models. 


Definition 1*5 ( Va li datio n ) . Val idat ion describes the process o f determ in i ng the 
accuracy with which mathematical models quantify the physical processes of inter- 
est. This necessarily involves the simulation code used to implement the model and 
experimental data from the process. 


1,1 Nature of Uncertainties and Errors 

In Example 1.1, we illustrated that uncertainties and errors arise in the modeling, 
simulation, and experimental components of applications. We detail the sources 
and nature of these uncertainties in tills section. 


1.1.1 Experimental Uncertainties and Limitations 

Experimental results are believed by everyone, except for the person who ran the 
experiment , quoted by Max Gunzburger, Florida State University; original source 
unknown. 

There are two fundamental sources of uncertainty and errors in experiments: 
limited or incomplete data and limited accuracy or resolution of sensors. The first 
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can bo broadly interpreted as due to the fact that experiment!* are often surrogates 
or provide only partial measurements when we cannot tally observe the underlying 
application due to physical infeasibility or expense. Examples include the following. 

■ The meteorological data noted in Example Li is obtained at discrete locations 
that can be uncertain. 

• Wind tunnel tests are used as surrogates for flight tests. The limitations of 
using a scale model in lieu of a full aircraft must be incorporated in designs, 

• Pharmaceutical and disease treatment strategies are often too dangerous or 
expensi ve for human tests or large segments of the population. For example, 
HIV trials are conducted with test subjects rather than a full population, 

■ Climate scenarios cannot be experimentally tested ai the planet scale- Instead s 
forcing mechanisms such as those due to volcanic eruptions are tested using 
measurements such as the 1991 Mount Pinatubo data see Section 2.2.1. 

■ In materials experiments, difficulties obtaining nano- and molecular- level time 
and spatial scale data limit multiscale testing of novel material designs. 

■ Subsurface hydrology data is very limited due to the expense and in feasibility 
of drilling large numbers of wells. As a result, there is significant uncertainty 
regarding specific subsurface structures — see Section 2,3. 

• The harsh radioactive, thermal, and chemical environments in a nuclear reac- 
tor core limit the availability of measurements for performance improvement, 
nondestructive evaluation, and safety regulation see Section 2.4, 

Whereas several of these examples illustrate limited rather than statistic: ally uncer- 
tain data, the associated deficiencies increase the reliance on models and augment 
model uncart smties due to lack of data- 

The limited accuracy or resolution of sensors contributes statistical uncer- 
tainties that can produce parameter uncertainties during model calibration. These 
sensor uncertainties can occasionally be specified by sensor manufacturers and are 
often amenable to st atistical analysis. 

1.1.2 Model and Input Uncertainties 

0-lly f all mntlf'lx arr vrro nr; , but .seme art-: astful, George E P Box, page 424 

of [38]. 

Model uncertainties arise from two sources; model errors or discrepancies and 
input uncertainties due to uncertain parameters, forcing functions, and initial and 
b oimdary com lit i o ns , 

Model Errors and Discrepancies 

Modeling errors or discrepancies are due to approximate or imprecise rep- 
resentation of underlying physical, biological, economic, or social processes. The 
following examples illustirU.e sources of model error. 
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* Numerous components of weather and climate models — e.g,, aerosol- induced 
dmni formation, greenhouse gas processes, turbulence — occur on scales that 
are much smaller than the numerical grids used to solve the atmospheric 
equations of physics. Moreover , many of these processes represent highly 
complex physics that is only partially understood, Subsequently n the processes 
<u'e represented by phenomenological models with nonphysical parameters. 
Doth the model form and the parameters exhibit uncertainly. 

* Many biological applications are coupled, complex, highly nonlinear, and 
; hi veil by poorly understood or stochastic processes. Moreover, they presently 
do not admit an encompassing set. of governing relations analogous to those 
in physics. Hence, associated models are subject to significant uncertainty, 

+ The predicted production of greenhouse gases is highly dependent- on projected 
economic and technological growth of nations. These processes are highly 
uncertain and difficult to model. This uncertainty is addressed in climate 
models by considering various economic and technology growth scenarios. 

The quantification of model errors is typically problem-dependent since it neces- 
sitates obtaining additional knowledge about the problem. The development of 
general statistical techniques > such as construction of Gaussian processes for model 
discrepancies, constitutes a current research topic. 

Input Uncertainties 

All models contain parameters that must be specified before the model can 
be used to represent or predict the behavior of the process. Moreover, differential 
equation models have initial or boundary conditions that must be designated in 
addition to potential exogenous forces. As noted in Definition 1.2, these compo- 
nents introduce input uncertainties that must bo quantified and propagated through 
models. 

* As indicated in Example 1.1 and shown in Sections 2.1 2.5, the phenomeno- 
logical models used to represent processes such as turbulence in weather t cli- 
mate, and nuclear reactor models have nonphysical parameters whose values 
and imcert.ainti.es must he determined using measured data. 

* It is shown in Section 2.2 that forcing and feedback mechanisms in climate 
models serve as boundary inputs. These parameterized phenomenological 
relations introduce both model and parameter uncertainties. 

The process of estimating model inputs based on measured data is typically 
termed model calibration or simply pain-meter estimation if inputs consist solely of 
parameters. The estimation of input uncertainties for a model using measured data 
is often referred to as inverse uncertainty quantification* 

Coupled Systems 

The quantification of model discrepancies and input uncertainties is typically 
challenging for individual components of a system model. Tim difficulty grows 
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substantially a& components arc bidirectionally or tightly coupled to quantify niul- 
t.iscale or multipbysies phenomena. For such problems the tight coupling general ]y 
prohibits a unidirectional propagation of discrepancies or uncertainties and instead 
necessitates a more global a ccommodation of these terms. Furthermore, the nature 
■ ■I inpuls or parumel ers ran cbMiige i: ilir-y mv n-.i 1 2 ; i ] 1 \ 1 .1 is ;i1 uiu I bur level in 

1 lie process. The applications discussed in Sections 2.1 2.5 yield system models 
that arc coupled to varying degrees, and the development of techniques to quantify 
mode] discrepancies and input uncertainties and construct prediction intervals for 
quantities of interest in such applications constitutes an active research area. 


1.1.3 Numerical Errors and Uncertainties 

Gofnputationul rexulf.s are believed hp no one, except l he person who wrote the rode. 
quoted by Max Cbm/burger, Florida State University; original source unknown. 

The characterization and regulation of numerical or algorithm errors is a topic 
in numerical analysis, and this is the least uncertain component of predictive sci- 
ences, Numerical errors and uncertainties include the following, 

• Roundoff, discretization, or approximation errors. 

•• Bugs or coding errors. 

■ Hit flipping., hardware failures, and uncertainty associated with future exa- 
scale and quantum computing. 

Additionally, the 5 -50 kin grids required for numerical solution of field equa- 
tions., in applications such as the weather model outlined in Example 1.1 , ate 
much larger than the scale of physics being modeled (e.g., turbulence or aerosol- 
induced cloud formation). This numerical requirement introduces uncertainty in 
phenomenological model relat ions. 

1.1.4 Types of Uncertainty 

The previous examples illustrate that modeling* experimental and numerical un- 
certainties, and errors can have various forms. The following definitions categorize 
uncertainties based on the degree to which they art- inherent to the application or 
reflect lack of knowledge. 

Definition 1.6 (Aleatoric Uncertainty)* Also known as statistical ; stochastic, 
or irreducible uncertainty, this is uncertainty inherent to a problem or experiment 
that in principle can not be reduced by additional physical or experimental knowl- 
edge, Examples include uncertainties associated with nonphysical model parame- 
ters, subgrid atmospheric conditions such as wind gusts, subsurface microbe levels 
between wells, and ink ini conditions for weather models. Aleatoric uncertainties 
are typically unbiased and are often naturally defined in a probabilistic framework. 
Hence additional experiments or knowledge serve to better categorize the uncer- 
tainty. 
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Definition 1*7 (Episfcemic or Systematic Uncertainties)* Episleinie uucer- 
tai Tities are those that are due to simplifying model assumptions, missing physics, 
or basic lack of knowledge. Many of the previously mentioned modeling errors e.g. , 
phenomenological expressions for input or closure relations — and numerical errors 
are episteinic in nature t These uncertainties are often biased, and they are typically 
less naturally defined in a probabilistic framework. 

If numerical errors are negligible, epislemie uncertainties are often termed 
model f wrors, model discrepancies, or model inadequacies. Ah detailed in (Chapter 12, 
the quantification of these components for extrapolatory predictions outside the 
calibration domain constitutes an active research area. 


The distinction between aleatoric and epistemic uncertainties is not always 
clear since lack of knowledge in relative and depends on current theory and ex- 
perimental capabilities. One goal of uncertainty quantification is to reformulate 
epistemic uncertainties as aleatoric uncertainties where probabilistic analysis is ap- 
plicable. 


1.2 Predictive Estimation 

A broad objective of predictive science is to use models, simulation codes, and ex- 
periments to predict system responses with quantified and reduced uncertainties. 
The probabilistic quantification of predicted experimental and computational out- 
comes with identified and quantified uncertainties is sometimes termed predictive 
estimation* As detailed in |o2], predictive estimation is comprised of three compo- 
nents. 

* M odd Calibration: This involves the assimilation or integration of data to 
quantify and update input mi certainties i-insoeiated with parameters, forcing 
functions, initial conditions, or boundary conditions. 

* M o ( lei E Prediction: Here one computes the response, or Qol, along with statis- 
tics, error bounds, or the probability density function (pdf) for the Qol. 

* Estimation of the Validation Regime: This ent ails estimating contours of con- 
stant probability for the Qol to establish a domain for predictions with spec- 
ified uncertainties. 


To illustrate the predictive estimation process, we consider the mathematical 
model 

y^f(x,q) ( 1 . 4 ) 

and statistical model 

T i ~ f(xi>tf) 1 h(xi) + (1*5) 

where q = [<71 , , . . , q p ] is a vector of inputs, y 6 M r "‘ is the model output or Qol, and 
X denotes independent variables such as time t or space x> In the statistical model. 
T* and are random variables representing measurements and measurement errors 
and 5 (Xi) denotes biases due to episleniic or systematic uncertainties, as defined in 
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Definition 1,7. IT numerical errors arc negligible,. <J{xs) quantifies the model error or 
model discrepancy. 

In the following description, we indicate associated chapters in parentheses. 
The complete predictive estimation process is depicted in Figure 1.',]. 


Predictive Estimation Process 


Input Representation (Chapter 5): One must first represent inputs in a prob- 
abilistic framework that facilitates model calibration and uncertainty propa- 
gation. This includes the construction of finite-dimensional representations 
for distributed or spatially varying parameters or initial conditions. 

* Parameter Selection (Chapter f>): For many applications, the parameter di- 
mension can be very large, e.g., p = 100 - 10 r \ which prohibits dir cel model 
calibration. Furthermore, parameters in many models are unidentifiable in the 
sense that they cannot be uniquely identified from the measured response. For 
such applications, one must employ parameter selection techniques to isolate 
a subset of most influential parameters q € IR ,r \ p < /j, that are employed for 
subsequent analysis. This relies on the contents of Chapters 14 and la. 

Local Sensitivity Analysis (Chapter 14): Local sensitivity analysis fo- 
cuses on the variability of the response as inputs arc varied about a 
nominal value as quantified by the derivative 

Global Sensitivity Analysis (Chapter 15): Global sensitivity analysis 
quantifier bow uncertainty in model responses can be apportioned to 



Figure 1.3. Components of the predictive estimation process and relevant chap- 
ters. Model calibration and uncertainty propagation are the. driving objectives. The 
re m m v u nq Lop} i \h are ? vtju:t r rd to n rhwv r th esc o bje< ■ tiv cs j a j lav ye- seal r a.pp licah a ns r 
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uncertainties in inputs, We consider variance-based Sobol methods and 
.screening algorithms based on ap proximal ion of the local index 
i'Viduaii-il :il i'audum value.- ff h: 1 1 l= ■ admt--.il ik- pai'am:. 1 : ^p-av. 


* Surrogate Models (Chapter 13): For the large-scale applications detailed in 
Chapter 2, the complexity of simulation models generally precludes their di- 
rect use for model calibration and uncertainty quantification. This is ad- 
dressed by constructing surrogate models y = /(y^rj) that encapsulate the 
primary behavior of the modeled process but are sufficiently efficient for model 
calibration, uncertainty propagation, and control implementation* 

- Stochastic Spectral Methods (Chapter 10}; Stochastic Galcrkin, colloca- 
lion. or discrete pr- ■Vcl n m ui: , ’hcds pniiide one f j 1 i r n fur coin pul i:iy 
surrogate models. 

- Sparse Grid Methods (Chapter 11); Sparse grid methods are required 
to implement stochastic spectral methods and for direct implementation 
of Bayesian model calibration methods when parameter dimensions are 
moderate; e.g„, p -8 — 50, 

* Model Discrepancy (Chapter 12): For applications that exhibit epistemic or 
systematic uncertainties due to model discrepancy or numerical errors, one 
must quantify the bias term 5{xi) in (1.5) using physical, mathematical, or 
statistical analysis. 

* M ode] Calibration (Chapters 7 and S): Flequentist or Bayesian techniques 
are used to quantify the uncertainties associated with inputs q based on mea- 
sured data y , In the Bayesian framework, inputs are formulated as random 
variables with associated pdf. For moderate inpul dimensions p 7 the sparse 
grid techniques of Chapter 11 can be used to directly evaluate Bayes" relation 
to avoid sam ] il 1 1 tg- ■ 1 ju se< ] h I strop olis algorith ms , 

* Uncertainty Propagation (Chapter 9): The final objective of predictive esti- 
mation is to propagate input uncertainties through models to construct pre- 
diction intervals or pdf Ibr Qol. For linearly parameterized problems, one 
can establish analytic relations for statistical moments, lor mildly nonlinear 
problems, linearization using Taylor expansions can achieve the same objec- 
tive. More general lyy one can employ stochastic polynomial or samp ling- based 
methods, such as those employed ibr model calil uration, to compute moments 
or construct prediction intervals for Qol. 

For large-scale applications* one must typically employ model selection tech- 
niques., address model discrepancies* and construct surrogate model* before model 
calibration and uncertainty propagation can be achieved. For applications with 
identifiable parameter sets, negligible epistemic uncertainties, and highly efficient 
simulation codes, however, one can focus immediately on the model calibration 
and uncertainty propagation techniques detailed in Chapters 7. 8, and 9. We refer 
readers to [52] for details regarding the estimation of the validation regime. 
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Large-Scale Applications 


In this chapter, we illustrate five applications where model predictions with quanti- 
fied uncertainties are critical for understanding and predicting scientific phenomena 
and making informed decisions and designs based on these predictions. These ap- 
plications are weather mo dels , climate models, subsurface hydrology and geology 
models, nuclear reactor designs, and models for biological phenomena. 


2.1 Weather Models 

If asked to list areas of science in which uncertainty quantification is critical for pre- 
dictive estimation, most people would likely include weather forecasting. Moreover, 
some would do so with a sense of derision while noting the occasional inaccuracy 
of forecasts. This negativity is due in part to the fact that, the role of uncertainty 
in scientific predictions receives little attention in secondary and undergraduate 
curricula, which results in a potentially false sense of security regarding scientific 
predictions. This loads to a poor understanding of weather forecasts— e.g.. sur- 
veys reveal that many people interpret a 50% chance of rain as meaning that half 
the viewing region will receive precipitation — and a general distrust of the science 
associated with weather forecasting. When one considers the complexity of the 
underlying phenomena and associated models, the inherently chaotic or unstable 
nature of the prediction process, and the uncertainties associated with the models, 
simulation codes, and data, the accuracy of forecasts with quantified uncertainties 
represents a major tour de force of physical modeling and scientific computation. 

To motivate the complexity of modeled phenomena, one need only consider 
factors required to predict temperatures, precipitation, and winds. As illustrated in 
Figure 2.1. temperature in the atmosphere depends on the absorption and emission 
of radiation, latent heat release* advection due to winds, and convective heating or 
cooling at the earth's surface. The temperature field thus depends on the horizontal 
and vertical distribution of small particulates ami liquid droplets (excluding cloud 
droplets and precipitation) that are collectively termed aerosols. Additionally, it 
depends on wind patterns that produce warm and cold air ad vet -lion, and the surface 
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Figure 2.1- Physical processes that must he incorporated in weather and climate 
models and the. associated 3-/J grid* Image courtesy of NOAA. 


topography and heat profile. Furthermore, phase tir ansi lit ms between liquid, solid, 
and vapor phases for atmospheric moisture add or remove latent, heat from the 
atmosphere > thus requiring that these effects he coupled with temperature models. 

Aerosols are associated with atmospheric conditions, such as dust or smog 
levels, and are catalysts for phenomena such as cloud formation since they serve as 
condensation nuclei around which cloud droplets form. The smallest aerosols have 
radii on the order of 0.1 jwiii arid are typically attributed to chemical conversion of 
sulfate gases to liquids or solids. First principles models for these processes thus 
occur on relatively small space and time scales and require quantification of the 
associated chemical processes. Larger aerosols include wind-driven dust particles, 
particulates from volcanic reactions > and combustion byproducts. 

Changes in temperature produce pressure gradients that in turn generate air 
movement and wind. Near the earth 'a surface, wind flow is significantly influenced 
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by tin:: terrain and surface ecu id i lions such as temperature. This produces highly 

c i. j li l j ■ i ; ■ x :\ \r\ i:)[icd \\[\d I icmid.-i i v ■ >r ■ ■!!'■ -cl s. A: ! = :■_■. 1 1 - t .ill \i -.n e-. hie sil il;i: 

slightly less complicated but is still fully coupled with all of the previously mentioned 
phenomena, Hence high altitude flow patterns, such as the jet stream, tend to be 
somewhat more stable than surface wind patterns, but they still exhibit highly 
turbulent dynamics, instabilities, and bifurcations due to the highly nonlinear and 
complex coupling with atmospheric conditions. 

2.1.1 Conservation Relations 

The physical components of meteorological or weather models are hawed on conser- 
vation of mass, momentum, and energy in combination with conservation of water- 
phases and constituent chemical spaces in aerosol models. 

As detailed in [193, 213],, conservation of energy using the first, law of thermo- 
dynamics yields l.he partial differential equation 


pCv 7u * pV ' v ~ v F + v J {kVT) H " pt ' l{T - Pj Ph 


( 2 , 1 ) 


where T and v are the. temperature and velocity at a point (x, y, z) E R 3 and p, cyqp, 
and k respectively denote the density, specific heat at constant volume, pressure, 
and thermal conductivity for an infinitesimal atmospheric volume V, Furthermore, 
/' is the net radiative iiux and q(T H p^p) is the rate of internal heating or cooling 
associated with processes such as latent heat release due to phase changes in atmo- 
spheric moisture. The pressure^ temperature, and density are typically related by 
the ideal gas law 

p ~ flRT, (2.2) 

where R is the specific gas constant for air. 

Atmospheric flow dynamics, such as the jet stream and local wind patterns, 
are quantified using conservation of mass and momentum, which yields the relations 


1 V ■ [pv) = 0 


( 2 . 3 ) 


and 


dv i - 

— = —v ’ Vv - —Vp— gk — 2fl x v. 
at p 


(2.4) 


Here g is the force due to gravity and —2ilx t* is the Coriohs force due to the earth's 
rotation. The expansion of the nonlinear term v j Vu and individual equations in 
spherical coordinates- can bo found in [196]. 

It is shown hi [103 that the concentration of water, in the solid, liquid, and 
gaseous phases, and aerosols can also be quantified using conservation relations. 
If we let , ^ 2 s and ws denote the mass of the solid, liquid, and gaseous water 
pi ia sos relative to the mass of air in the same volume, the concentration of each 
phase can be represented by the relation 


3m, 


-V ■ V TO J + S,„ (7\ rfij , Xj ■ p) . :i 1 , 2 , 3 . 


(2-5) 
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The source nr sink terms S m . quantify the processes that govern phase transitions , 
These can be extremely complex due to their dependence on aerosol consent rations 
X j and coupled atmospheric processes. The physics associated with the terms S m 
is often difficult or impossible to establish on the grid scales illustrated in Figure 2,1 . 
thus necessitating phenomenological parameterization^. 

Aerosol concentrations are quantified in a similar manner. If we consider .7 
species with concentrations Xj * conservation principles yield the relations 

= ■ s xA T >xj< p) • i = 1 J ' (2.6) 

where incorporate changes in state* chemical reactions, and sedimentation. Like 

Xj Xr o 

, paraineterizationfi are typically required to construct these terms. 

The set of equations 


Op 

dt 

+ v • o»> = 

V 

Jl 

v ■ Vv 


i)T 

pev 

ot +pV - " 

P 

pRT, 

dm 

3 _ ... T-.., 


1. 

P 


( 2 . 7 ) 


dt 

£xi 
r it 


= ~r ■ 7xj- I S Xj {T,Xj,p) , j = J. 


are often referred to as the tfjuaho n$ of fiPmosph.^rir physics* If otie neglects the 
species relations For m.j and y, and employs hydrostatic approximations to the mo- 
mentum equation, one obtains what are often termed the primitive equations for 
Euler i an fluid motion. Various meteorological and climate models are constructed 
by employing simplified forms of these relations in combination with parameter iza- 


tions for F, q, S mj , and S :x , . 

We note that metrological models for tropical dynamics are significantly 
more complex than those for midlatitude or extratropical regions, e.g.. poleward 
from about latitude. In the middle latitudes, the primary source of energy 
driving wind patterns is temper at ure-iiiduccd pressure gradients in balance with 
Coriolis forces, and latent heat release and radiative heating are secondary con- 
tributors to atmospheric dynamics. Here geost-rophic or quasi-geostrophic theory, 
based on a balance of pressure gradients and Coriolis* forces, provides simplified 
momentum relations for meteorological models. 


In the tropics, however, temperature gradients are smaller and latent heat re- 
lease associated with convective cloud systems is a more significant source of energy. 
Moreover. Coriolis forces are also smaller and there is a more significant coupling 
between atmospheric and ocean temperatures. Meteorological and regional climate 
models for these regions must thus incorporate the interaction between cumulus 
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convention and mesoscale and large-scale circulations as well as equatorial wave 
dynamics arid air- ocean interactions. Resulting weather phenomena include mon- 
soons, the trade winds, hurricanes, and El Nino. Readers are referred to Chapter II 
of [113] and the included references for more details regarding tropical weather phe- 
nomena and associated modeling issues. 

2,1,2 Phenomenological Models 

To numerically solve the coupled relations (2.7), expressions for the net radiative flux 
jF, rate q(T,p, p) due to latent heat release . and source terms S m jT, m j , ;\ r j , p) and 
(T, Jtj] p) must, be specified. For phenomena such as radiation transfer, modeling 
relations can he based on physical principles. However, the terms S wn and S ^ are 
highly complex and occur on scales that are much smaller than computational grids. 
This necessitates the construction of phenomenological models having nonphysical 
parameters that are determined through model calibration via data assimilation or 
from independent experiments. 

To illustrate, it is csi ablished in [193] that if m 2 represents the concentration 
of water in liquid phase, then a simplified form of S m , is 

S m , - S { 4- S 2 + S 3 - S 4l (2.8) 

where S\ represents the conversion of cloud droplets to form raindrops, S 2 represents 
the accretion of cloud water by raindrops. S 3 represents the melting of snow or ice 
to rain, and S.± quantifies the evaporation of rain. An accepted relation for S \ is 

S‘i - p(m 2 - mt) 2 1,2 X 10" 1 + ( 1.560 X LO" 1 - - - - -) . (2.9) 

I V fiofma — m 2 ) / 

where and are constants that must he specified. 

Parameterized phenomenological models are required for a range of atmo- 
spheric and terrestrial phenomena* including aerosol- induced cloud formulations, 
reactions that produce aerosols, turbulence at various levels in the atmosphere* and 
drag and surface effects due to mountains and attributes of the terrain. Quantifi- 
cation of uncertainties in parameters and phenomenological models is necessary for 
quantifying uncertainties in predictions of Qol, 


2.1.3 Simulation Models 


Essentially every weather person refers to predictions resulting from computer mod- 
el#, Care must be exercised when interpreting this phrase since it really implies 
simulations obtained using discretized physical models. 

All approximation techniques yield atmospheric state values on a 3-D grid 
such as that depicted in Figure 2.1, The horizontal and vertical grid spacing is 
determined by a number of factors, including the spatial scales of modeled physics 
and available computing resources. Local meteorological models employ horizontal 
grids on the order of 5 km with vertical spacing of approximately 200 m. Global 
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IS-.I'I:'! I”:r;i'. nnd wv\ 1 e models n-.-. v^eih eripliy kirgur :u i ■ i . - ■ .-n: n! grd.- dull 

are on t. h $ on lev of 50 1 0 0 k i n . 

The gridsizes required for weather and climate models constitute a signifi- 
cant source of uncertainty since parameterized processes such as aerosol-induced 
cloud formation, latent heat generation due to cloud formation, and atmospheric 


turbulence occur on much smaller, subgrid, scales. 

Various discretization techniques can be employed to approximate the rela- 
tions (2.7). Finite difference techniques are typically employed to discretize vertical 
spatial derivatives, whereas finite difference, or occasionally finite element, tech- 
niques are employed for the horizontal components of regional and some global 
models. Other global models exploit periodicity by employing spectral approxima- 
tion methods. 

Seinidiscretization in space yields very large, coupled, vector- valued systems 
of differentia] equations that must be numerically integrated forward in time to 
provide predictions. The use of explicit methods introduces stringent limits on 
temporal stepsizes due to stability or Courant Friedrichs-Lewy (CFL) conditions 
that can limit the utility of algorithms. For example , to quantify gravity waves in 
jot streams that have maximum velocities of 20 0 m/s, horizontal grids of 100 km 
yield maximum time stops of approximately 8 minutes. Tb address this, numerical 
weather prediction (WVP) models commonly employ semi l.agraiigiati integration 
techniques that approximate the path of air parcels (Lagrangian perspective) while 
predicting values oil a fixed (Eulerlan) grid. The advantage is that large icmpm-d 
stepsizes can be employed without loss of stability. Moreover, semi- Lagrangian 
schemes can be constructed to ensure that species concentrations are conserved 


during advection. 

The leu gth of simulations varies for differing models-* The UK Met. Office Uni- 
fie- 1 Model is run six days into the future, whereas the European Centre for Medium- 
Range Weather Forecasts’ (ECMWF) Integrated Forecast, System provides 111 day 
predictions. The decline of accuracy for longer predictions 5s further discussed in 
the context of uncertainty quantification for weather forecasts. 


2.1,4 Model Calibration, Data, and Data Assimilation 

Meteorological models used for weather forecasting are essentially iniliuJ- v<du?. proh- 
lems. Based on measurements, one attempts to determine initial conditions and 
parameters so that models match present atmospheric conditions, as closely as pos- 
sible, illustrated in Figure 2.2, Data assimilation techniques arc used to calibrate 
models by determining inputs (initial conditions and parameters) based on a wide 
range of measurements. Models arc then numerically integrated forward in time 
to provide foEucuHLs. Due to the nonlinear nature of the models, they are chaotic 
in the sense that uncertainties in initial conditions grow in an unstable manner, 
thus significantly diminishing the accuracy of forecasts for increasingly long future 
periods. This highlights the necessity of considering ensembles and statistical Qol, 
such as average temperatures, and quantifying the uncertainty in predictions, 

A critical component of model calibration via data assimilation is globally dis- 
tributed and frequently obtained earth-surface and atmospheric data, In the United 
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Figure 2.2. Assimilation period to estimate initial conditions and parameters so 
the m ode. 1 best fits data nath measurement uncertainties ^ Forecast with quantified 
uncertainties. 


States, there are approximately 1000 surface measurement locations, whereas the 
World Meteorological Organization (WMO) maintains approximately 10,000 land 
sites worldwide. Surface measurements on the ocean are obtained by moored anti 
drifting buoys as well as ships on an array of routes. Atmospheric conditions are 
provided by radiosondes in weather balloons that rise into the stratosphere, weather 


satellites, commercial aircraft on prescribed routes, and reconnaissance aircraft that 
can be sent to regions yielding high impact data, e.g.. regions where there is signif- 
icant uncertainty or storms such as hurricanes. 

In combination, this yields a fairly rich data environment. From the perspec- 
tive of model construction, prediction, and uncertainty quantification, however, 
three properties of the data are important: il is measured on highly irregular grids 
and it is frequently distributed in time, the observations are not direct measure- 
ments of state variables, and there is uncertainty associated with all data. The 


limited accuracy of measuring devices constitutes one source of uncertainty. Sec- 
ond, several arc moving, so there arc varying degrees of uncertainty associated with 
the position and time of measurements. 

To illustrate issues pertaining to data assimilation, we assume a grid of size 
432 x 320 x 50 with a minimum of S variables comprising the wind speed T temper- 
ature, pressure, and moisture phases. This yields 5.53 x 10 7 states x. Phenomeno- 
logical components of the models can easily require in excess of 50 parameters in 
addition to initial conditions. Since observations arc on the order of thousands, the 
p ro bit; in is Vi i ghl y l n i< lore 1 el: on 1 1 i n ed . 

To construct a functional to be minimized, wc let iu denote the vector of ob- 


servations at times where we assume n measurement times. States x are mapped 
to observations by the operator H so that p — H(x). Prior information x }i , ob- 
tained from a previous model forecast, is often termed the background . The error 


and background error covariance matrices arc denoted by //; and D. Predictions 
a^i_|_ i provided by the NWP model, based on current states Xj, can be represented 


X 't + | — A 1-i + I J ( J'i ) % (2.10) 

where represents the discretization of the nonlinear model from time t* to 

time l-i i i , 
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The -da-tii assimilation algorithm 4D-VAR is employed at several weather pre- 
diction centers, including the KCMWF and the UK Met Office as well as stations 
in Japan and Canada. In this algorithm, one minimises the functional 

n 

J(*o) (*d - *$) T B 1 (a* - <) f - H - H{x,)) (2.11) 

•;=o 


subject to the model dynamics 


— A (^0 ) - 

A primary difference between 4D-VAR and the previous algorithm 3D- VAR is the 
use of ad 1 joints to incorporate the times at which observations are made. Whereas 
ensemble Kalman filters are being investigated for NWP models, 4 D- VAR is presently 
considered the state of the ar: . 

We emphasize the fact that determination of initial conditions, often termed 
ill- 1 .-iW-iM 'tfsts. .mi- 1 in- idol p r-ir; n 1 1 : 1 1 : 1 ’■ r- 1 1 :;i 1 pre-vid 1 r- h- -s'. Ill ■ snbsi'^uen: ubi-ucva 
Lions is central to NWP; This is in contrast to climate models which are essentially 
forced boundary value problems that, are run until the transient effects of initial 
conditions are mitigated. 


2,1,5 Sources and Nature of Uncertainties 


There are four primary soiu r ces of uncertainty or errors in NWP models: model 
limitations, input uncertainties due to initial conditions, boundary conditions or 
1 5 : i : ■ 1 1 : 1 1 - ■ i. ■ ■ r;- . 1 1 1 l 1 1 1 : 1 _ ' i ■! - : i 1 in-: rs. a:id uric orl Mi i' .i^ ia i m-^si ; il-i ! . i a i il s. Whini c -aih.ii - I 
with the highly nonlinear nature of models, these produce response uncertainties 
1 hat. fp'uw as ii h motion of time and must, be quantified to provide a meaningful 
context in which to interpret predictions or forecasts. 

Although they are based on physical conservation laws, the continuity, mo- 
mentum, temperature, and species equations (2.7) are still approximations of the 
underlying physical phenomena. Moreover* the phenomenological relations used to 
model the flux and rate terms F and q and source terms S m and 6' >; _ are ap- 
proximate representations for highly complex and often only partially understood 
physical phenomena such as turbulence, aerosol- induced cloud formation, and re- 
sulting latent heat release as precipitation forms. These uncertainties are primarily 
epistemic, as defined in Definition 1,7. 

As illustrated in (2.9), the phenomenological components of the models con- 
tain parameters that are often nonphysical and hence cannot be correlated with 
measured data. Thus both their values and variability must bo inferred through 
model calibration or data assimilation techniques, liven physical parameters such 
as the thermal conductivity k will exhibit some variability due to varying atmo- 
spheric conditions and the fact, that they may partially accommodate unmodcled 
physics. In meteorological models, the initial atmospheric conditions constitute a 


second critical source of uncertainty that must be inferred from later observations. 
These uncertainties are aleatoric or stochastic in the sense defined in Definition 1,6. 
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The numerical discretization of the mo dels introduce uncertainty and errors in 
two fundamental whvh: the solution on a 3-D grid introduces dgnific&nt uncertainty 
for parameterizations of subgrid scale physics, and the approximation techniques 
introduce discretization errors. The first is critical for a number of phenomena 
such as turbulence and aerosol-induced cloud formation which occur at the level of 
meters, whereas horizontal grids are oil. the order of tens to hundreds of kilometers. 
This uncertainty is obviously related to the previously mentioned model uncertainty. 
Discretization errors can often he asymptotically quantified nsin^ theory associated 
with finite difference, finite element, spectral, or semi- Lagrangi an techniques. These 
are errors that are typically epistemic in nature. 

Uncertainty in the data can arise.’ from two sources: limited accuracy of mea- 
surement devices and uncertainty associated with variability in the location and 
time of moving sensors. The first Is often categorized as aleatoric, whereas the 
second is primarily epistemic. 


2.1.6 Role of Uncertainty Quantification in Weather Forecasts 


As illustrated in F igurc 2.2. uncertainty quantification for weather forecasting takes 
place in two steps. In the model calibration step, values and uncertainties associated 
with inputs, such ns Initial conditions and parameters, are determined using data 
assimilation techniques such as 4D-VAR, As detailed in Chapter 3, this is often 
performed in a Bayesian framework, as evidenced by the priors in (2.11). In 
the second step., the calibrated models arc run forward in time to provide forecasts 
with quantified uncertainties. 

In the 1970 ? s and 1980 % it was recognized that forecasts obtained with a single 
model simulation or realization had limited utility due to the inherent uncertainty 
and chaos induced by the highly nonlinear initial value models. This led to the use 
of cH-semWt? forecasts, which have been standard since the 1990 ’s. In single model 
approaches, ensemble forecasts are obtained by running multiple simulations from 
an indivii liual model with differing initial conditions or parameter values drawn from 
probability densities constructed during the calibration phase. Using the ensemble 
predictions, one constructs statistical Qol, sum as the average temperature, relative 
humidity; or projected rain amounts. Using ensemble forecasts, a 50% chance of 
rain two days in the future means that given the present atmospheric conditions, 
h; i If of the simulations predict measurable rain amounts at some random point in 
the specified area. Improved forecasts can be obtained using multimodel ensemble 
forecasts in which ensemble predictions from multiple models arc used to construct 
Qol and uncertainty bounds. The reduction in variability of ensemble forecasts for 
the hurricane Katrina, obtained with an additional 12 hours of data, is illustrated 
in Figure 2-3- The hurricane position was very near the mode of the ensemble 
predictions when it made landfall near New Orleans. 


Whereas uncertainty bounds or probability densities for Qol are constructed 
during the ensemble computations, they typically are not reported in forecasts. 
One exception is the prediction of large storms such as cyclones, tropical storms, or 
hurricanes. In these cases, forecasts usually include both the predicted trajectory 
and cones of uncertainty, as illustrated in Figure 2.4. 
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Figure 2.3, M/W ensemble forecasts made at (a) 00 UTC and (b) 12 t/Ttf ofi 
August. 20- 2005- atrvfta 'ffiade kiU-dfuH new N&w Orleans at 12 f/TC O/fc .4w/^.v/ 
20, 2005 . 

2.1.7 References 

The topic of weather modeling and prediction is vast, and we have summarized only 
some aspects of the discipline to illustral' - l lie role of uncertainty quantification for 
predictive estimation. Additional discussion regarding basic weather phenomena 
can be found in [28, 173 ], whereas detailed derivations of the atmospheric physics re- 
lations (2.7) and parameterized constitutive relations are provided in [123, 193, 213]. 
These references also contain an overview of numerical techniques for simulating 
weather models including semi-Lagrangian integration techniques. The implemen- 
tation of data assimilation algorithms, such as -ID- VAR. is addressed in [12!), 182), 
and [129, 150] provide further details about ensemble forecasting. Finally, the reader 
is referred to 263 for discussion regarding statistical issues and techniques associ- 
ated with weather modeling and prediction. 



Figure 2.4. .-VO/i A image of the trajectory and cone of unixrlmnty for Katrina, 
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2.2 Climate Models 

T1h l im pression many people have of climate models is that they are simply weather 
models extended to much longer timescales of years, decades, centuries , or millennia. 
They subsequently conclude that since two- week weather predictions are typically 
poor, the accuracy of climate models must therefore lie very suspect. Whereas 
climate and weather models both require the quantification of coupled atmospheric, 
1 ; 1 1 ■- 1 1 . I : I . ai::l ih:r i: .-i : :n ess: -s , * Vie \'.= i r-l ly i V I h vi i lg " i l i 1 1 -s r : i i ■ r- dul I . ■ 1 ha I 

different physical mechanisms be emphasized in the two modeling regimes. 

As detailed in Section 2,1, weather models are highly nonlinear > initial value 
problems whose chaotic nature necessitates ensemble computations to predict sta- 
tistical Qol whose accuracy is typically limited to in days to two weeks. Hence they 
an lmsed n]i atmu^i iJs rir o .-m lil ii ms - u-: I l as wind patterns, rai liai hv. e. mv. ■■ -i iv. ■_ 
and latent heat driven temperature changes, and aerosol and moisture levels on 
fairly short timescales. They can thus neglect seasonal effects* long-term anthro- 
pogenic and natural forcing terms such as increased C(W levels and volcanic ash. 
and influences such as deforestation. The emphasis is t.o use data assimilation tech- 
niques such as 4D-VAR. to determine initial atmospheric conditions and parameter 
values so that models match recent and current conditions in a statistically accu- 
rate manner. Ensemble predictions are then used to compute future Qol such as 
temperature, preeipitaliou levels, and storm tracks. 

Climate models differ in the sense that they are required to accurately main- 
tain a balance between absorbed solar energy and lower frequency infrared radia- 
tion emitted to space — typically termed the earth’s global tmeigy holaure or eiyergy 
budget for decades up to centuries. As illustrated in Figure 2.5, this energy budget 
is influenced by numerous natural and anthropogenic factors including greenhouse 
gas levels , seasonal effects, volcanic urn pi k ms, deforestation, ocean dynamics, and 
polar ice coverage. The timescales dictate that the transient effects due to initial 
atmospheric and terrestrial conditions are essentially negligible. Instead, eoinpu- 



Figurc 2.5, Earth 's energy budget modified from [137]. 
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tatiori of the energy budget. requires quantification of energy fluxes at the earth's 
surface and sources and sinks in the atmosphere. Hence climate models exhibit 
the dynamics of forced boundary value problems. One ramification is that chaotic 
dynamics associated with weather models are largely mitigated in climate models . 
As with weather models* one goal is to compute statistical Qol such as long-term 
change in CO 2 or temperature levels* with quantified uncertainties. The scope in 
climate models is much broader, however, since it additionally involves questions 
such as the following; 


• Is the planet getting warmer and are manmade processes the cause? 

• Are atmospheric and/or oceanic circulation patterns or currents changing and, 
if so, what will be the effect? 

• Arc the weather and climate becoming more extreme or variable and, if so, 
what, are the ramifications? 


Answers to these questions, with quantified uncertainties, are required to ad- 
l 1 ress socie t al i juest i o us, 

* Will climate changes lead to improved agriculture and food supplies or reduced 
supplies due to widespread drought? 

* Will sea- level changes threaten large civilization centers? 

* Will changes in ozone levels significantly increase the incidence of cancer? 


In sul 'sequent discussion* we highlight ways in which uncertainty quantifica- 
tion is critical for obtaining climate predictions with uncertainties quantified in a 
manner that informs both scientists and policy makers. 


2.2.1 Climate Forces and Feedback 


The equations (2.7) quantify the basic processes associated with atmospheric physics. 
The difference in how those 1 relations are used to construct weather arid climate mod- 
ids lies i : 1 lie 1 shiiplifvmg nssumpl iou^ and p.i r . n 1 1 = 1 : nri/m I plieii j: nei ujlce.ical nmdA-. 
used to quantify the net radiative flux F, rate of internal heating q, and source 
le-niir, .S li: and .S\ . WV si:ieiii;i vi/e b.i-n p'avsi 1 : 1 1 pheiiomeici : i:d ... 1 r-. ■■ A ' \w 
certainty associated with the radiative fluxes since these art: central to the global 


energy balance that drives climate models. 

We illustrate in Figure 2,5 factors that influence the balance between solar 
energy absorbed by r the atmosphere and earth and infrared radiation emitted to 
space. As detailed in [157], approximately 342 W/iir of solar radiation enters the 
earth's atmosphere n where it. is absorbed or reflected by the atmosphere, ground, 
ocean, or surface ice. In the atmosphere, the primary absorbers are water vapor in 
the troposphere and ozone in the stratosphere. Clouds are the primary mechanism 
that scatter and reflect radiation. O 11 the earth's surface, oceans and phenomena 
uudi as deforestation and changing polar icecaps directly affect absorption and re- 
flection rates. Visible light constitutes the primary wavelength reaching the earth's 
surface with lesser amounts of UV and near infrared (heat) radiation. 
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Radiation duo to heat at the earths surface and re fleeted solar radiation both 


contribute to the longer wavelength infrared radiation that is emitted into space. 
The amount of infrared radiation is governed by cloud and water vapor levels along 
with greenhouse gas concentrations. 

Latent heat associated with cloud formation and precipitation constitutes the 
primary nonradiative heal source in the atmosphere. A significant emphasis in 
climate modeling focuses on quantifying the mean behavior and uncertainties asso- 
cia ted with these radiative and nonradiative processes. 

Climate Forcing mechanisms are defined as changes imposed on the earth's 
energy balance that produce changes in the climate. These can include external 
changes due to variability in the earth’s orbit, fluctuations in solar radiation, and 
comet or meteor impacts, or internal factors such as volcanic eruptions, deforesta- 
tion, or changes in aerosol and CO^ levels. We note that the internal forces can be 
both natural and human-induced. Feedback processes are those in which changes 
in the climate state serve as forces that produce further climate changes. Exam- 
ples include changes in cloud cover due to aerosols, changes in surface reflection 
di ,ie to melting polar icefields, and changing greenhouse gas levels due to increased 
temperature- induced evaporation. We point out that all of the climate forcing and 
feedback mechanisms are quantified using phenomenological models, often having 
a large number of nonphysical parameters. This introduces significant uncertainty 
that must be quantified in final climate model predictions. 

Various natural and human-induced forcing mechanisms, along with uncer- 
tainties and a qualitative indication of the level of scientific understanding, axe 
illustrated in Figure 2.6. We summarize next aspects of these mechanisms and 


indicate how associated uncertainties influence climate models and predictions. 


Natural External and Internal Forcing Mechanisms 

Whereas we cannot control natural forces, their quantification is necessary to 
determine their relative influence compared with anthropogenic forces that vve can 
control . 

S ft to ■ Radi alio 1 1 

There are two primary mechanisms chat affect the level of solar radiation 
entering the earth's atmosphere: variability in the earth’s orbit, arid fluctuations in 
solar activity. The periodicity of sunspot activity with approximately 1 1-year cycles 
has long been known, and the res lilting variation in solar radiation is incorporated 
in models. However, the effect of this variability on weather and climate is still 
debated. Since the 1D7UM radiation data has been measured by satellites and long- 
term variability is inferred from carbon data in tree rings. Sources of uncertainty 
include the parameterized models and measured data. 

Volcani c Erupti o ns 

The eruption of Krakatoa in 1883 caused average global temperatures to drop 
by approximately i ~C in the subsequent year and produced noticeable variability 
in weather for approximately 5 years. The eruption of Mount Pinatubo in HJOl 
occurred after the advent of a global monitoring network, so it- significantly advanced 
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Radiative forcing of climate between 1750 and 2005 
Radiative Forcing Terms 
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Figure 2 + tb Contribution and uncertainties associated with climate forcing meeh- 
f mifi-Tn.H from the 2007 IPCC Report |22S, Figure SPM.2 - Level of scientific under 
standing - Long-lived greenhouse gases: High; Ozone: Med; Surface albedo ; Med- 
Low: To lol ae roso l: M c.d- Lou?: So lai ' in u dianct ; Lo w . 


our undt-sr standing of how large volcanic eniptimiK could force dimate changes. 
Furthermore, it provided a unique opportunity to advance and test volcanic inputs 
to climate models. 


Volcanic forciii 
introduced into the 


g is due to the high levels of particulates and gases that arc 
atmosphere. The manner in which these aerosols affect the 


earth's energy balance is largely dependent upon the height lo which they arc 
injected, Noriabsorbing aerosols reduce the amount of solar radiation that reaches 
the earth's surface, whereas greenhouse effects are increased by aerosols that absorb 
and emit in the infrared spectrum. The aerosols produced by Mount Pinatubo 
reduced the solar energy reaching the earth's surface by 3-4 W/m 2 and cooled 
global temperatures by approximately 0.5 °C, as illustrated in Figure 2.7. The 
climate changes due to volcanoes axe relatively short-term unless they occur in 
conjunction with other climate forces or feedbacks. 
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Figure *1,7. Measured and predicted tropospheric temperatures swTounding the 
June 15, 1091 Mount Pinatubo ewtptitm; data from [1.06]- 

H uman-lnduced Forci tig Media tiisins 


The quantification of anthropogenic climate forcing mechanisms is of funda- 
mental importance since wc can control them, 

( 1 re.enhous e Ct as F missions 

As illustrated in Figure 2.5, the greenhouse effect is the process in which 
thermal radiation from the earth's surface is absorbed and re It -i-: ■■cl by greenhouse 
gases such as water vapor, C G 2 > methane, ozone, and ch loro fluorocarbons (CFCs) 
such as the refrigerant EYeon, Because much of the thermal energy is radiated 
back to earth, increased levels of greenhouse gases produce an elevation in average 
surface temperatures. The name is somewhat of a misnomer since warming is due 
to changes in the absorption and reflection of radiated thermal energy rather than 
restriction of convective heat loss, as is the case in a glass greenhouse . 

Warming due to increased greenhouse, gas levels constitutes one of the most 
heavily studied areas of anthropogenic climate forcing, and it was stated in the 2007 
IPCC Assessment Report that u It is very likely that greenhouse gas forcing has been 
the dominant cause of the observed warming of globally averaged temperatures in 
the last 50 years” [228;. 

Because CO 2 is the most abundant greenhouse gas after water vapor and 
because it is the byproduct of burning fossil fuels, its concentration levels have been 
extensive -ly studied and documented* CO 2 concentrations monitored at Mauna 
Loa. Hawaii, since l 958 are plotted in Figure 2.8(a). The fluctuations have not 
adequately been explained but are likely due to complicated feedback with short- 
term atrnos j j her ic condi tier 1 s . 

Ice-core data has been used to establish CO 2 concentrations for the past 
8QU.Q00 years at locations such as Antarctica and Greenland. Figure 2.8(b) illus- 
trates the increase of CO 2 concentrations for the last 2000 years. Because measured 
carbon isotopes can lie used to establish the sources, the figure also illustrates that 
the increase in CO 2 levels since the Industrial Revolution in the mid-nineteenth 
century is largely influenced by CO 2 emission from fossil fuels. 

Based on data published by [164], the 2001 IPCC published Figure 2.9, which 
illustrated reconstructed temperatures from 1000 to 1980, measured temperatures 
from 1902 to 1999, 40- year smoothed averages, and two standard error limits [115], 
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Figure 2 * 8 . (a) Concentration of CO -2 measured at Manna Loa : data from http:// 
w w w , esrL noaa .gov/ gmd/ ccgg/ trends / , (b) Concenimf i o ns o f ryt oenJt.o use. gas t-'s war 
the last 2000 years; data from 2007 IPCC report [228 , Figure SPM. lj. 


The reconstructed temperatures are based on tree ring data, coral, ice core mea- 
surements, ai id h bto ri cal reco rds. C O 2 an d 1 ■ 1 up erati 1 re results of the type pi t >x,u ^ ] 
in Figures 2.8 and 2.9 have been cited by numerous scientists and policy makers 
as evidence that increasing anthropogenic CO 2 levels are leading to unprecedented 
temperature increases which in turn produce global wanning. 

While accepted by a large percentage of the scientific community, this topic is 
:-i ill l igli v .. 1. :i 1 on'. : i ms ;r 1. : -.1 ri-i::, i-. -nl ilic argmnr 111 s 1 wi vo booi nr ,?irlc t pies: : i mi-ig 

the data, conclusions, future predictions and resulting policies, and even the fun- 
damental viability of mathematical and statistical models as a means of providing 
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Figure 2.9, Proxy; temperatures from 1000— 198(1, tn ea&ured tempemtures from 
1002 1999, 4^} -year smoothed averages, and two standard error limits; from 2001 
IPCC Report 115, SPM , Figure 1 , 
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meaningful climate predictions, While the sources of scientific disagreement are nu- 
merous, nearly all can be distilled to differing assumptions employed in statistical 
analysis or differences in how uncertainties are quantified (or neglected) in data, 
models, and predicted QoL We summarize only a few representative examples of 
where unceidaiijt.y must he adequately quantified and incorporated to establish and 
predict the degree of anthropogenic global warming. 


* The ice core and tree ring analysis, used to establish past CO 2 and temper 
at lives, exhibit varying degrees of uncertainty or inaccuracy. For example, 
it is detailed in [49] that unaccommodated age-dependent variability in tree 
ring widths, for broad- leaved species such as oaks, can flatten longer-term cli- 
mate fluctuations and potentially produce misleading data. Termed the seg- 
ment length curse, this can cause serious underestimation of climate variability 
when the age of ancient, timbers cannot he established through independent 
means. The initial debate regarding the accuracy of the temperature “hockey 
stick" plot centered 011 the statistical methods used to construct proxy past 
temperature based on statistically blended tree ring measurements, 

■ hi 1 : 1 j s 1 1 y t .1 r-i. ■ error bars or imeerlaii 1' : .'r- nr : 1 ■ mined. 1 Iv.is misnipr ■s-: il n:e 
the validity of past values or future predictions. For example, the standard 
error limits shown in Figure 2.9 ace not plotted in many of the reports that 
use this figure to argue that present temperatures are significantly higher 
than during the past millennium. W lien viewed with uncertainty measures on 
past proxy values, this conclusion is less dramatic. Moreover, when viewed 
with error mcn^nrcincan^. it is difficult to correlate Figure 2.9 with reported 
past climate variations such as the Medieval Worm Period from roughly 950- 
1251 ), when Greenland was colonized by the Vikings, and the LtiUe. Ice Afjc 
from 1400 1700. However, the global nature of these events has not been 
established and is debated. 


Most greenhouse gas analysis focuses 011 the role of CO 2 and methane, since 
they are fossil fuel combustion byproducts, with some analysis of ozone and 
CFCs. However t water vapor is by far the most abundant greenhouse gas. 
Significantly less ef Mrl has focused 011 accurate measurement of water va- 
por over the last millennium. Moreover., it was noted in Sect. ion 3.1 that, 
because moisture phase transitions exhibit complex aerosol and temperature 
dependencies and occur on suhgrid scales, phenomenological models having 
nonphysical parameters will introduce substantial uncertainties in models and 
parameters. 


• The specification of which data is utilised and which is neglected has intro- 
duced uncertainty into both present interpretations and future predictions. 
The conclusion drawn from Figures 2.8 and 2.9 is that present CO> levels 
and temperatures are higher now than in the past millennium. However, ice 
core measurements have demonstrated significantly higher CO 2 and temper- 
ature changes over the last 800,000 years e.g.. temperature variations of up 
to 15 C [117, 127], In this context, variations of 5 C in the next century 
fit well within past levels. It must be noted, however, that whereas the earth 
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has exhibited significantly more extreme temperatures in the past, it. was a ] ho 
not, habitable by hum arts under those conditions 

* Climate model predictions must incorporate future CO 2 levels to predict 
greenhouse effects. This requires models that predict the growth of nations 
and their economies and technologies since these factors influence fossil fuel 
usage. This is very difficult and prone to significant uncertainty. For example, 
it is unlikely that models from lQOO would have accurately predicted current 
CO 2 emissions in China. A* detailed in Section 2-2-3, this had led to predic- 
tions based on various population, economic:, and technological scenarios. 


I^mther details regarding the role of uncertainty quantification in climate mod- 
els will be provided in Section 2.2.3. 

Aerosol Emission 

[1 was noted in Section 2.1 that aerosols critically affect cloud formation, 
which in turn affects the global energy balance. It was further noted that aerosol 
models are complicated by the fact that they contain complex forcing terms that 
must, be quantified using parameterized phenomenological models; sexy e.g. n {2.G). 
In climate models, predicted aerosol levels are further complicated by the fact that, 
like greenhouse gas levels, they rely 011 socioeconomic growth models. Associated 
uncertainties arc partially addressed by considering scenarios of the type discussed 
in Section 2.2.3. 


D e fores tati on an d D e. s ertifi catio n 

The absorption of CO 2 through photosynthesis, and ahsorpiioo and emission 
of solar energy at the earth’s surface, is strongly influenced by the nature of surface 
vegetation. At present, approximately 29% of the earth ? s land surface is forested 
and 11% is arable. However, this is changing as forests arc cut — especially in the 
tropics such as the Auia/tm liawin in Brazil — croplands are urbanised, and sparse 
vegetation is grazed in semi arid areas. As w ith greenhouse gases and aerosols, 
models quantifying future deforestation and desertification are highly uncertain 
since they depend on demographic and socioeconomic factors. 


Climate Feedback Mechanisms 

Forced climate changes are complemented by feedback mechanisms driven by 
changes in climate conditions such as temperature, precipitation levels, or aerosol 
concentrations. 

fee Albedo Effects 

The percentage of solar energy reflected by a substance or object, termed 
the albedo i is an important factor in the global energy balance. On the eartlTs 

-.1 U 1 : 1 ■: 1 L: : 1 Ml *i 1 ■ :j "1: ■ Ml. WmUT llaVC V. TV ■ lllflTi 1:1 all n b: . and 1 iu- L «T] LU'T L‘: d !■ ' 'I -■ 

significantly more solar energy than the latter. The dependence of ice levels on 
tern pc rati ires is a positive feedback mechanism since increasing temperatures reduce 
the area covered by snow and ice, which subsequently produces further increase 
in temperatures. Whereas ice and snow albedo effects constitute an important 
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component in the global energy balance, t hey are also a ssource of uncertainty in 
climate models flue to the complexity of associated physios and its highly nonlinear 
coupling in global models. 

fVv fu 'uho va.s p WhUrr Vrj.pr? r ■ j a* re Is 

Wo noted previously that water vapor constitutes the most abundant green- 
house gas. Because increasing temperatures enhance evaporation that increases 
greenhouse gas levels, Hie interdependency between greenhouse gases and temper- 
atures constitutes a positive climate: feedback mechanism in addition to being a 
forced response. 


2.2.2 Climate Models 

The general framework used to construct climate models is sim ilar to that described 
in Section 2,1.1 for weather models. Primitive equations derived from the equations 
of atmospheric payees (2.7) arc numerically solved on a 3-D grid of the form illus- 
trated in Figure 2.1 . As lor weather models, the numerical grid introduces significant 
uncertainty for processes such as aerosol-induced cloud formation and greenhouse 
gas levels that must be quantified using parameterized phenomenological models on 
subgrid scales. 

It was noted in Section 2.2 that climate models differ fundamentally from 
weather Models in the sense that- they are forced boundary value problems rather 
than initial value problems. This introduces the problem of resolving complex 
climate forces and feedbacks over very long time scales (e.g., millennia) but signifi- 
cantly reduces the chaotic behavior associated with uncertain initial conditions. 

Fignre 2.10 illustrates the evolution of climate models over the last 30 years. 
Whereas ocean currents can be represented by primitive equations constructed us- 
ing conservation of mass, momentum, and energy, the majority of processes such 
as cloud microphysics, radiation, surface energy fluxes, turbulence, aerosol levels, 
chemistry, sea ice formation, and dynamic vegetation levels are quantified using 
parameterized phenomenological models. As with weather models n values and un- 
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Figure 2,10. Development of climate models modified from the 2(1 Ui IPCC report 

[115]. 
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certainties for these parameters ninst be determined using model calibration tech- 


niques. 

Climate models are validated by testing their ability to simulate past climatic 
events (pakochmates) such as the Cretaceous and Last Glacial Maximum (LGM) 
based on proxy T measured, or estimated climate forces. Their validity is also tested 
by simulating climate responses to current forces such as the eruption of Mount 
Pmatnbo, as illustrated in Figure 2.7 r 

Examples of present climate models include the NSF, DOE, and NASA spon- 
sored Community Earth System Model (CESM) and the Hadley Centre model 
H ad CM 3 developed in the United Kingdom, The package CESM 1.0 is publicly 
available and include!-! atmosphere! laud, sea ice, ocean, and land ice modules, The 
package HadCM3 was highly cited in the 200 1 IPCC report [115] and includes atmo- 
spheric and ocean components, including sea ice. Aw noted in [178] , the atmospheric 
package Had A M3 has on the order of 100 parameters of which approximately 29 
are considered to control key atmospheric and surface processes.. 


2.2.3 Role of Uncertainty Quantification in Dimale Modeling 

It was detailed in Section 2.2 d that the quantification of uncertainties in temper- 
ature and greenhouse gas data and models is critical for ascertaining the present 
levels of anthropogenic greenhouse effects and predicting future ramifications of po- 
tential warming. We summarise here other ways in which uncertainty quantification 
will be critical for making viable climate predictions. 

For weather models, uncertainties associated with inputs were comprised pri- 
marily of those associated with initial conditions and nonphysical model parameters 
that were determined using data assimilation techniques such as 4D-VAFL For cli- 
mate models, initial conditions are replaced by forced boundary conditions i.hal: 
introduce significant uncertainty for phenomena such as predicted greenhouse gas 
and aerosol emissions and rates of deforestation. Models for these phenomena are 
highly uncertain since they are based on socioeconomic and demographic factors 
that are also highly uncertain. Furthermore, since these forcing mechanisms and 
associated feedback loops occur in the future, data assimilation techniques are not 
viable lor constructing associated error bounds or pdf. The uncertainties associ- 
ated with parameters and models is augmented by the fact that numerous forcing 
and feedback mechanisms occur at subgrid scales and require parameterized phe- 
nomenological models to quantify complex or poorly understood physics. 

Due to tin:? highly uncertain nature of future aerosol and greenhouse gas emis- 
sion levels, the IPCC reports predicted emission levels arid dim ate changes for four 
representing a range of demographic, economic, and technological growth. 

* A 1 : Rapid economic growl 1 1 w ith increasingly efficient, technology mid a in id- 
century population peak A] FI represents fossil fuel intensive energy usage, 
whereas A IT and AlB respectively represent nan fossil and balanced energy 
resources. 


* A2: Slow economic and technologic**] growth and large population growth. 
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■* PSl; Saule global population dynamics a* Al bui more rapid change toward a 
service a ml information economy. 

m T32: Intermediate population and economic growth with local solutions to 
environmental, social, and economic sustainability. 


The report does not assign likelihoods to these scenarios. 

The IPCC report 1 1 8Rj lists key uncertainties as well as robust findings. Rep- 
resentative uncertainties from that report include the following. 


* "The magnitude of CO 2 emissions from land-use change and C H ,j emissions 
from individual sources remain as key uncertainties a 

* “Aerosol impacts on the magnitude of the temperature response, on clouds 

a) i^i on precipitation remain uncertain i. 1 ’ 


• "Models differ considerably in their estimates of the strength of different feed- 
backs in the climate system, particularly cloud feedbacks, oceanic heat uptake 
and carbon cycle feedbacks, although progress has been made in these areas.' 1 


• L arge-scale ocean circulation changes beyond the 21* 1 century cannot be 
reliably assessed because of uncertainties in ihu melt water supply from the 
Greenland ice sheet and model response to the warming." 


* "Projections of climate change and its impacts beyond 2050 are strongly 
scenario- and model-dependent,, and improved projections would require im- 
proved understanding of sources of uncertainty and enhancements in system- 
atic observation networks." 


* “Understanding of low-pmbabiljt.v/higU-hnp&jGt e 1 vents and the cumulative 
impacts of sequences of smaller events, which is required for risk- based ap- 
proaches to decision-making, is generally limited." 


Since the objective of climate models is to predict trends for the future, it is 
natural to consider statistical Qol such as average temperatures, precipitation levels, 
or amounts of sea level rise. The manner in which Qol and associated uncertainties 
are reported depends on the highly varied nature of input uncertainties. When in- 
put uncertainties can be reasonable quantified, Monte Carlo simulations from input 
densities or confidence intervals are used to construct uncertainty bounds for QoL 
For highly uncertain inputs such as future aerosol or greenhouse gas emissions, pre- 
dictions based on the scenarios Al, A2, Bl, and B2 are provided. For example, Fig- 
ure 2,11, from the 2007 IPCC report [188], illustrates the predicted average change 


in global surface tempera Lures for those scenarios. The scenario AlFl, representing 
intensive reliance on fossil fuel energy sources, predicts a best estimate increase of 
4 °C by 2100 with a likely assessed uncertainty range of 2.4 6.4 °C. To place this in 
perspective, it is predicted that a I C temperature rise would cause approximately 
35 % reduction in African crop production . up to 50% less water available in the 
Mediterranean and Southern Africa, coastal flooding that could displace up to 300 
million people annually, and loss of up to half the arctic tundra [49]. It is noted 
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Figure 2.11, Levels of or o. wye global wwming predicted using the sr.enctmo# A 1 P 
A2 ? HI, and B2; from the 2007 IPCC report [188, Figure 3.2] , 


in t-he report that due to uncertainties in the ocean- atmosphere couplings, neither 
best estimate prod lot ions nor uncertainty' bounds. could be reasonably established 
for sea level rise in 21 DO, Instead, ranges are reported 

Due to the impact that climate change can have on civilization.;, it is critical 
that physical, biological, and socioeconomic modeling, data collection and analy- 
sis, statistical and mathematical analysis, large-scale computing, and uncertainty 
quantification continue to be investigated in concert to advance the state of cli- 
mate models so they can inform scientists in a manner that includes quantified 
uncertainties. This information must then he conveyed to policy makers and the 
general public in a manner that encourages economic and technological growth that 
mi ri i in i ^es the degree to wh i eh ant h top oge 1 1 it : forces a cce] c?rai e dinmte d is-inge. 


2.2.4 Notes and References 

As with weather modeling, we have focused only oil aspects of climate modeling 
to highlight issues and motivate the central role of uncertainty quantifier ' i n: i for 
understanding present climate trends and predicting future climate conditions. The 
basic atmospheric phenomena are the same as those for- weather, and the underlying 
physical principles are detailed in [123, 147, 193, 213, 252|. The reader is referred 
to 1 172 1 for a perspective of research issues at the intersection between weather 
and climate and [ 163] for an overview of how the stochastic properties of turbulent 
dynamical systems pertain to climate models. The texts [28, 49. 169] provide very 
r< . : u Uil desrripl i« ms -.v i T : ■ gs T- 1 ■ umgv 1 ;iT:i -i ■ . 1 1 ■ : ■ m.nnod and human-induced 
causes of climate change, issues associated with proxy measurements obtained using 
ice-core and tree ring data n required parameterized components of climate models* 
consequences of climate change, and debates pertaining to the subject. Details 
regarding various forcing mechanisms can be found an 136] . The 19&5, 2001, and 
2007 Intergovernmental Panel on Climate Change (IPCC) Assessment Reports on 
Climate Change I I "i. 1S8. 228| provide ■ a conq lrdicnsive • ii™ :•!' i )::■ hi in- ■ 
basis for end impact of dimate change. These reports lisi key uncertainties and 
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robust, findings which arc defined as those that hold under a variety of assumptions, 
models, or approaches. Although listed as robust findings, conclusions such as 
“Anthropogenic warming over the last three decades has likely had a discernible 
influence at the global scale on observed changes in many physical and biological 


sy s tc i ns" si i) 1 eon vey i 1 1 l ie re ill. uucei 'T ail i ty . 

It was noted that whereas phenomena such as global warming are accepted 
by a majority of the scientific community, there is still serious scientific debate 
regar i i 1 1 : ■ : :i" :i:-1 mv ■ f mo lels. dal ti. ami c mid i is: mis. M udi . <1 : lii- I Mr-, hppeurud 
ill the popular literature and is difficult to judge since it does not include strict 
scientific rigor. The text (I -19] presents a fairly balanced presentation of issues that 
may detract, from the validity of mainstream theory for climate change. 


2,3 Subsurface Hydrology and Geology 

Uncertainty pervades subsurface hydrology and geology for the simple reason that 
the earth's subsurface is largely inaccessible to observation or measurement except 
: l = :irdi od nui.nl icr - I ' 1 v i 1 ' i i ■ ■ dies ■ a ndng mai ming i 1 1 ' i i *: pies sudi ;i-- seism it - lo- 
in o graph y. Tins places Kiibs1.mil ial demand on models and illustrates the criticality 
of obtaining accurate predictions with quantified and reduced uncertainties. 

Subsurface hydrology addresses issues such as the availability and contamina- 
tion of subsurface groundwater and has the goal of answering questions that include 
the following. 


* 




Is it safe to sequester carbon dioxide in a depleted oil reservoir? 
How much groundwater is available in an aquifer? — Figure 2.12 


PredpHslton 

Alluvial 



Figure 2.12. Subsurface stmt a and the manner in which they affect aquifer prop- 
erties; from 44 . 
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Figure 2.13. Representation of repositories in (A) saturated .zones and (B) unsat- 
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* Will hydraulic fracturing (fracking) contaminate water tables in a drilling 
region? Figure 2.14 


Will microbial action naturally degrade contaminant levels to specified levels 
in a gi ver i t l i lie f r ar nt: ? 


* What is the average time for transport of stored radioactive materials from a 
repository to a human environment ? 


The issues addressed in petroleum geology are similar. 

* Does a given region offer substantial oil or gas reserves? 

* Will drilling for oil have significant environ n ici kt.al ri^k? 


FFfrC-S.hu 

ri'j-d 



Figure 2+144 Depiction of the potential impact of fracking on aquifers. Image 
courtesy of Mike Norton. 
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2.3.1 Models and Role of Uncertainty 


To ilium: rate issues associated with model and parameter uncertainty, we summarize 
a typical ml: .« 1: G used : : ■ cbai .:u -I ri i/- ■ :-.lI i-.ia liic. ■ -j.2 1 mnt i w m :r :l- will: .V reactive 
species having concentrations c ri (£, ;r) and mass flux J n (t,x). As detailed in [245], 
the flow equations on a domain £i hounded by a surface I 1 l’p U I'.v, where Vp 
and F.y denote Dirichlet and Neumann segments of the boundary, are 


Oh , 

5 -V-q-f , q -KVh, 
at 

dc 

-rf = -V • J„ - c,v) , = -r>Vc n + "7 f: JL ■ ft = 1 A 7 . 

at f;> 


( 2 . 12 ) 


with initial and boundary conditions 


/i.(G, x) = ha ; ft = //.a 1 t T/) : n ■ cj 1 = Q, j; t T.-v, 

c n (u, x) = c rj!i|l ; Cl, = O'ftT -j 1 1 r /j . h • sf t^ji , rv r 


(2.13) 


Here h(£, x) and q(t, x) denote the hydraulic head (equilibrium water elevation) and 
Darcy velocity due to uncertain sources or sinks /(£,&). The specific si ora" e ^.(x), 
hydraulic conductivity tensor K U:'l. porosity 0{;r) t and dispersion tensor D (;r) are 
uncertain properties of the heterogeneous subsurface environment fL For example, 
gravel, sand, and clay can yield different porosities, as illustrated in Figure 2. 1 5. The 

chemical reactions R n are assumed to have uncertain reaction rates a - !^i n , N.jy ] 

and uncertain forcing terms. Finally, ft-o(x) and c n(J (x) denote the initial hydraulic 
head distribution and species concentration and II {#), C^(sc) 7 Q(a?), and Q n (x) arc 
the hydraulic head, species concentration, and fluxes specified at the boundary. 
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Figure 2,15, Porosity of various subsurface strata: from [44]- 
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To accommodate these uncertainties, parameters and forces are considered to 
be fandom fields or random processus if they are time varying. To relate parameter 
uncertainties to geologic uncertainties, one can decompose the domain fi into M 
nonoverlapping subdomains £"!* , so that fi = and represent the random 

fields on each subdomain; e.g., 

x £ ill 

x e 

Uncertainty in the nature of the decomposition and number of subdomains can then 
b e p ropagat ed th r ougl 1 (2.12). 

To determine uncertainty bounds or pdf for h or c n , one must first specify 
uncertainties or pdf for the parameters and driving forces and then propagate these 
uncertainties through (2.12). In theory, one could parameterize the random fields 
in space and estimate pdf using model calibration lecliniques. In practice, however, 
one often invokes assumptions such as station ail ty and assumes parametric pdf 
relations such as normal or lognormal [245]. In the final step of the analysis, best 
estimate values and uncertainties for the states /?. and c n would be used to compute 
best estimate values and uncertainties for Qol necessary to answer the questions 
posed at the beginning of this section. 



2.3.2 References 


Details regarding the physical phenomena and models for subsurface hydrology and 
geology can be found in [194]. The reader is referred to [no] for information re- 
garding model calibration, sensitivity analysis, and uncertainty quantification for 
groundwater and 144, 245, 260] and the included references for details regarding 
Markov chain Monte Carlo analysis and probabilistic risk assessmenl in the context, 
of subsurface hydrology. Readers can find additional information regarding the P R A 
for nuclear waste disposal at Yucca Mountain in Chapter 3 of 250;, In addition 
to geological and geotechnical hazards, this includes aircraft crash hazards, indus- 


trial and military-related activity hazards, and potential hazards due to weather. 
Del. nds regarding hydi'miUr' fra el Tiring, Us associated models, and its potential en- 
vironmental impact, can be found in |1S7, 251. 269]. Finally* the reader is referred 
to 1 109] and the included references for an overview of uncertainty quantification 
issues ececvyimi!:; pii v in hi - ii- 'i’.. m :r u: «'.n mi: iew;i'. ■” : n- .■dels. 


2.4 Nuclear Reactor Design 

Nuclear power constitutes a major source of sustainable energy, as evidenced by 
the fact that in the year 2000, there were 434 nuclear power stations producing 
350,442 M We worldwide [233]. This included 252 pressurized water reactors (PWR) t 
including the Russian VVER reactors, 92 boiling water reactors (BWR'L and 34 
gas- cooled reactors (GCR). Additionally, there were 34 heavy water water-cooled 
reacti > rs , 1 5 grap ! i ite m o derated reactors ( RE J\ ] K Rea ktor l 3o 1 si icy A 1 osl ) cl j ti ost i 
Kanaluiy), and 2 liqu id- metal fast breeder reactors (LMFBR). Nuclear power plants 
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presently provide approximately i9% of the electricity in the United States and 14- 
1 5 % o f th e wr >rl f 1 e lect ri c 4 t;y . 

Although the efficiency of nuclear power plants has improved over the last 30 
years, there have not been significant changes in the fundamental designs. Further- 
more, there has been a basic halt in construction of now plants in the United States 
for the last 30 years despite an increase in demand from 1980 when nuclear power 
plants provided approximately 1 1 'A of the nation's electricity, 

A number of resources are presently focused on the development of model 
and simulation-based design tools to improve reactor efficiency, reduce operating 
costs, reduce nuclear waste, and continue to enhance safety. A primary goal of 
the Department of Energy (DOE) funded Consortium for Advanced Simulation of 
Light Water Reactors (CASL) is to develop a “virtual reactor' environment that can 
predict the behavior of and interactions between the nuclear fuel, neutron transport, 
heat transfer* and thermal-hydraulic (coolant flow) components of a reactor vising 
limited experimental measurements. This predictive environment would be used 
to improve present systems and enhance the design of next generation reactors. 
To motivate the central role of uncertainty quantification in this predictive design 
environment, we summarise the 1 -asic components of light water reactors and aspects 
of the models used to quantify the underlying physics. 


2.4.1 Light Wat6r Reactors 

Light, water reactors employ ordinary water, termed light water, as a coolant and 
neutron moderator. This is in contrast to heavy water reactors employed in Canada 
(CANDU) and India (AHWR). The two primary designs, PWR and BWR, are 
illustrated in Figure 2.1 fi. In the United States, there are presently 194 nuclear 
power plants of which GB are V\\ R and 35 are BWR. 

In all nuclear reactors, heat is produced by controlled nuclear fission in the 
reactor core. As depicted in Figure 2.17^ the core contains nuclear fuel rods, control 
rods, and water- filled channels. The fuel rods, which are approximately 12 feet long, 
contain uranium or uranium oxide pellets. The control rods contain elements such 
as cadmium = r: hafnium 1 1 1 ;i l I i^i u"l ■ jicijI runs. 1 1n is sh >v. big clmm rcat't.i is. 

In a BWR, heat transferred through the fuel cladding turns the light water to 
steam, which directly drives turbines before being condensed using secondary lake, 
river, or ocean water. In a PWR, coolant is circulated through the core under high 
pn-.-.U]i- j- rerun ins liquid despite being 1 ■" L 1 m i emperu- UV-S ■ >11 ' : M ■ • :1 dr] :■[ 

315 C, This hot primary coolant then flows through a heat exchanger where ii 
generates steam in an isolated secondary coolant that in turn drives turbines. The 
separation of the two coolant sources prevents accidental radioactive contamination 
of the secondary eoolatil source, as was the case in the FPkushima plant, which is a 
BWR. We note that a ! 100 _MWe PWR. core can contain over 50.000 fuel rods and 
18 million fuel pellets in approximately 193 fuel assemblies. 

To sustain a drain reaction, fast fission neutrons must be slowed before they 
can interact with the uranium. This is termed moderation or thermalization and, 
in light water reactors, this function is directly provided by the coolant water. In 
the process of colliding with hydrogen atoms in the water, neutrons lose speed until 
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Fif>nfH 2.16, ScJu-romiirs of (a) PWIf and ( 1 j ) H WH d£$i§ns. Images r.ourtv&y of 
United States Nuclear Regulatory Commission. 


their velocity is comparable with the thermal velocity of the nuclei, at which point- 
they are called thermal neutrons. 

Ail important stability feature of light water reactors is the fact that as tem- 
peratures increase! the water density decreases. This in turn decreases the slowing 
of neutrons, which slows the chain reaction. This negative feedback loop means that 
i:' there is a spike in the nuclear reaction, the coolant will moderate it. If coolant is 
lost due to an accident, loss of the moderator stops active fission. However, there 
will still be an approximately 5% decay heat for 1 3 years, which is hot enough to 
melt the core if sufficient cooling is not provided. While not as dangerous as active 
fission, the potential for decay heat is a risk factor associated with LWR designs. 
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Figure 2.17. Schematic of a nuclear fuel assembly (U*S* Department of Energy). 

This is in contrast to the B11MK i. iesign used at Chernobyl, which employed 
graphite as a moderator. This design is less robust and is subject, to rapid transients, 
which is considered one of the causes of tlie Chernobyl disaster. 

There are two primary mechanisms for cm M rolling the power generated by a 
PWR: insertion or retraction of the neutron absorbing control rods and varying the 
concentration of boric acid dissolved in the reactor coolant. The latter acts in a 
manner similar to the control rods since boron readily absorbs neutrons. Rather 
than using boric acid in the reactor coolant. BWR adjust the coolant flow rate to 
control reactor power. 

Further advantages and disadvantages of fight, water reactor designs, including 
their use in nuclear ships and submarines , are detailed in 7G, 233], 


2*4.2 Reactor Models 

Models for components of the reactor core reflect the complexity of the underlying 
physics. We summarize only aspects of the neutron ics transport equations and 
thermal- hydraulic relations to illustrate issues that must be addressed for model 
verification, validation, and uncertainty quantification. 


N c ut ro n Tv a nsp o r t E qua t ion 

The quantification of neutron distributions in the reactor core is central to 
reactor models since neutron densities and energy levels govern the various nuclear 
reactions that occur in the reactor. The neutron interaction with the primary 
coolant is also critical since the coolant serves as a neutron moderator and the 
net itn j r is 1 leat the t :( h A ai 1 1 . 

To specify the neutron distribution, one typically considers the angular neu- 
tron flux ip(r, EAlrt), where r = is a position vector, v is the velocity. 
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SI = v/\v\ is a solid angle that specifics the direction of motion, and E is the energy 
level at time h We note that this is h scalar flux as compared with vet tor- valued 
electromagnetic fluxes. 

To construct the neutron transport equation, one balances neutron sources and 
loss mechanisms for an arbil nary volume V . Sources include fission, neutrons enter- 
ing V, and neutrons with different E f ,Q, r values that incur a scattering collision that 
changes their energy and direction to E, fi . which we denote by E f — ^ E, SY — > fi. 
Losses are due to neutrons leaving V and neutrons suffering a collision. 

As detailed in [76], a balance of the source and loss terms yields the 3-D 
neutron transport equations 


— % + U - + E ( (r, E)v(r, E, iY t) 

v at 


+ 


/ do! 

•f 1‘JT JO 

X{E) f 

| 5T -A* 


k OG 

dE’T,,{E' -f E y ii -4 S2)v?(r, E\ i) 
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d<Y / dE‘v(E , )'Ef(E’)<p{T, 




where Ef, Y.f are macroscopic total and fission cross- sections defined as the ratio 
of the reaction rate (s _1 ) to incident flux (cm' 2 s” 1 ) when a beam of particles im- 
pinges on a nucleus. The double differential scattering cross-section E* character) - / os 
scatter from (h'\ ?d -r i to (E, fl) in the cone dQ. Finally, yi’ /i) and v{£*) respectively 
denote the fission spectrum and average number of fission neutrons produced by 
fission resulting from neutrons with energy E\ 

We note that the iutegro- differential equation (2.14) is linear in the state y- 
but is a function of seven independent variables (r = y, z, £, fi = in three 

dimensions. Posed in terms of a general source term #, the 1-D t ransport equation 
is 
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E. E‘\ }i\ t) + s(x, E, }i.t }, 


where fi ■ cos &. Further details regarding the derivation and numerical implemen- 
tation of the neutron transport equation** can be found in \7fi. 233] . 

The transport equation provides a highly accurate, description of neutron dis- 
tributions in a reactor if one has correct macroscopic cross-section information. 
Hence from the perspective of uncertainty quantification, one must quantify uncer- 
tainties associated with the cross-sections and propagate them through the model 
to construct uncertainties for the QoL Moreover, the dependence of cross-sections 
on r and E is complicated due to the complex reactor geometry and energy profiles, 
which includes resonant behavior and threshold effects. 

Due to their central role in nuclear reactor design, however, numerous exper- 
iments have been performed to establish libraries of cross-section distributions for 
various stable and radioactive nuclei that can be employed for uncertainty quan- 
tification. It is noted in Chapter 17 of 51) that the establishment and use of these 
libraries for reactor analysis was an early example of model calibration based on 
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data assimilation, sensitivity analysis, and uncertainty quantification based on a 
systematic : m at h email cal and phy si cai h ai t te \vt jrk .. 

Because the quantification of neutron distributions is so critical to nuclear 
reactor design, numerous commercial, government laboratory, and proprietary neu- 
tron transport codes have been developed. Deterministic codes include the mas* 
sively parallel transport code DENOVQ developed at Oak Ridge National Labora- 
tory (ORNL) and the commercial code Atilla, Probabilistic software includes the 
OR XL code KRXO and the Los Alamos National I jab oratory (LA. XL) code MCNP. 


Thermal- Hydra ulit: Mojeb 

The characterization of primary coolant behavior in a nuclear reactor is highly 
complex since it includes tin? integration of high pressure, two-phase flow dynam- 
ics, lion i conduction in the fuel, heat transfer , and neutron interactions in complex 
gcouM 1 ' mos. However. i: is .dso dilira] I ■ i ' : ■; i-: i ■ -r ■: li .-.ij.n i : i ■ ■ is'-'uial e (luanlifi 
cation of void fraction distributions and boiling transitions is essential for opti- 
mized performance and maintenance of safety margins. We ill List rate only as peels 
of thermal-hydraulic models to illustrate some of the associated difficulties, and we 
refer the reader to 51. 121, 2U0 for detailed derivation and numerical analysis of 
these relations. 

Due 1 : I I. I 31 .- | 1 M ■ ■ 'ill I :■ if . I ■' ' I i ■ 1 .MI-1 ! : . p i : I' pi la.M - -. ill ill" IV 'HC "l (>l - ■ 'I I III I 

licks, two- phase flow models are used to simulate transient and steady state coolant 
behavior. The two-phase mixture is modeled using conservation of mass, momen- 
tum, and energy in combination with conservation relations for compounds such 
as boron. Additionally, one employs closure relations commensurate with the op- 
erating conditions -c,g-, one uses different phenomenological relations for laminar 
versus turbulent conditions or boiling versus high pressure nonboiling conditions. 

To ifii.-: v 1 1 1 1 ■ . vri- i r-,, . a ; n --1 nviivcly den-ore lb- v. . ir no I Y .= i f : u ni ■ :f g ns .= : i : : 1 
fluid M]ici lei /■_. . ,-j , ; i : 1 1 L ; ■ , , . r ■ rlenul e 1 Le duusil ies nut: veleeil ic-s of lln i gas arid fluid 
phases. The internal energies for the two p liases are denoted by e in e j . As detailed 
in (116) (123) of Chapter 16 of [51 . conservation of m ass, momentum, and energy 
i eg j j eel i vely yield t \ m f 1 1 l i d pb ase re] at ions 


and 


+ v ■ = -r, 
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with analogous coupled gas phase: relations, Here energy, momentum, and mass ex- 
change at the vapor/ 1 5 quid interfaces are respectively modeled with the constitutive 
relations 

// = MT f -T fl ) f 

F = Q{v f - v g ), (2,18) 

1' = - a 9 )(T sat (p f ) - T 9 ) - K f 1, 


where and $g are the temperature and entropy density of the two phases. 
r l\ (I i(pf) is the saturation temperature at the continuous phase pressure pf , and 
K- t Cj and 7 are positive transport coefficients that must be estimated during model 
calibration. The viscous and heat transport coefficients are modelled using the 
constitutive relations 

1 7 = . A = — AVT, (2.19) 


where Y] and A must lie estimated. Relations for A'y, T/A^, s* , and F H are 
provided in Chapter 16 of [51]. Further details regarding the derivation of the 
thermal-hydraulic: equations and required constitutive relations for varying - hypothe- 
ses regarding the operating conditions and carrier fluid and dispersed gas particles 
can also be found in [51, \22+ 2QE>j. 

Various packages have been developed to numerically solve the thermal- 
hydraulic equations including RELAP 5 developed at Idaho National Laboratory 


(INL), CAT HA RE, FLIC A 4, and COBRA. As detailed in [20fi\ RELAPSED pro- 
vides multidimensional hydrodynamic and reader kinetics modeling capabilities. 
The CATHAKE package allows reactor coolant circuits to be modeled as inter- 
connected submodules, whereas FLIC.A4 combines 3-D, two- phase fluid simulation 
capabilities with 1 -D heat conduction for the fuel. COBRA is a subchannel code for 
which the subchannel spacing constitutes the finest lateral mesh. All four provide 
the capability for specifying various phenomenological closure conditions. Addition- 
ally, nuclear energy companies such as Wcstiughouse have developed and employed 
their own thernial-Jiv'draulic codes, such as VIPRE-W. 


2,4.3 Role of Uncertainty Quantification for Reactor Design 


As with weather and climate models, there are four genera] sources of uncertainties 
and errors in nuclear reactor models: input uncertainties, model errors, numerical 
errors, and uncertainties in measurements. Like the atmospheric science relations 
(2.7), the thermal- hydraulic relations (2d 6 ) and (2d 7) are based on conservation 
of mass, momentum, energy, and species concentrations in combination with phe- 
nomenological closure relations and source terms. The resulting coupled system is 
highly nonlinear and numerically difficult to resolve and verify for small gri drives. 
For example, the lateral mesh in COBRA cannot be resolved beyond the dimen- 
sions of the subchannels. It is noted in the RELAP5 code manual that the employed 
system of equations is ill-posed. This necessitates the use of artificial damping to 
obtain numerical solutions that are viable for nuclear reactor analysis. 

In addition to numerical errors, the thermal- hydraulic relations exhibit varying 
degrees of uncertainty in the phenomenological models used to characterize com- 
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plex or poorly understood physics — 6,g,, (2, IS) and (2.19), turbulence relations, 
and phenomenological models for quantifying subchannel void fractions and t he 
nonphysical parameters in these models. Unlike the cross-section uncertainties for 
the neutron transport model, which are fairly well characterized experimentally* 
the uncertainties for nonphysical thermal-hydraulic parameters must be estimated 
using model calibration techniques before they can be propagated through models. 
Finally, the harsh environment inside a nuclear reactor limits the number and type 
of measurements that can be obtained For mode] calibration and validation and 
experimental design. 

A substantial challenge associated with uncertainty quantification for nuclear 
reactor designs is the necessity for propagating uiicori.ainUes through several linked 
simulation codes for all of the coupled subsystems. It has been illustrated that 
heat transfer, coolant flow, neutron distributions, and fission reaction rates are all 
tightly coupled to form a highly complex multi corn pound, multiphysios system. The 
resulting models and simulation codes are computationally intensive and require the 
development of surrogate models to construct uncertainties for Qol, The synthesis 
of surrogate modeling and modular approaches for uncertainty quantification in 
large, multiphysics systems constitutes an area of active research. 

As with weather, climate, and subsurface hydrology models, wo arc often 
interested in quantifying uncertainties associated with statistical Qol. Relevant; Qol 
for nuclear reactor design include the following. 


■ S pec t fy 1 k > i i a u Is on vt j id IV m :t ion di st r i b i it ions anti \ j oi [ 1 t ig 1 1 ai isi t. i: n i s t h a t g i iar 
antee specified performance levels and safety margins. 


■ 


S pec i fy cone lit ions that lim i t C RU D 1 
prescribed levels. 


on the outside of fuel cladding to within 


* Determine new cladding materials s fuel materials , and fuel pin geometries 
that provide an average specified improvement in performance and increased 
resistance to damage. 


■* Determine eondi lions dial produce specified levels of radiation damage, me- 
chanical thermal fatigue, and corrosion . 


In all eases, measurement, input , model, and numerical uncertainties must be de- 
termined to provide predictive estimates For the Qol with quantified and reduced 
uncertainties. 


Re i mirk 2*1* An issue that complicates model calibration for many comprehensive 
simulation packages, including those considered for nuclear reactor simulations, is 
the fact that inputs and parameters are often hard-coded and inaccessible to users. 
Moreover. [ be vnlues o-t liiir I eodeo iiioul > are qjeeifU'd on .i : asis : :\a\ is often not 
reported. For example, only one parameter is generally accessible in. COBRA. This 

1 CR.1, D is. a colloquial term that rtfors to corrosion and wear pradurts that became radioactive 
when exposed to radiation. The acronym originated from references te Chalk River Unidentified 
Deposit, 
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is done to facility it: use by nonexperts and i.c> minimize inadvertent changes to in- 
put&p However j this necessitates significant, code modification for model calibration 
and uncertainty quantification, which complicates the process. 

2.4.4 Probabilistic Risk Assessment (PRA) 

The field of PRA grew out of the Three Mile Island nuclear power plant accident 
in 1979 and the Challenger space shuttle disaster in IflSO. As detailed in [32, 245] , 
structured PRA involves defining a system failure for complex multi component, 
multiphysics problems, identifying basic events that can cause a system failure, 
building a fault tree to relate component events to a system failure, and relmiug 
the joint probability of events to the probability of a system failure. The assignment 
of probabilities to events can be quantitative or qualitative in the sense that expert 
opinion is used to assign probabilities to various scenarios, as is the case with 
economic and technological growth in climate models. As detailed in |245 . the 
related held of risk management is an example of optimization under uncertainty 
where the goal is to mitigate risk in the presence of uncertainty. This involves 
optimization techniques such as stochastic and fuzzy programming. 


2.4,5 References 


The 197(5 text |7b| is fairly old, but it still provides a comprehensive and very 
readable exposition of issues pertaining to nuclear reactor design and associated 
models. This can be complemented by the text [233] . The text [151] is a good ref- 
erence regarding computational techniques For the neutron transport equation, and 
the reader is referred to [122] for additional information pertaining to the thermal- 
hy c ii an [ i,e ec pi at ions. r 1 1 ie 2012. fi ve- volume I i and book, of Nuclear Enginee.ri ng [51] 
is a great resource detailing the present state of the art for numerous aspects of 
the field. Specifically, readers arc respectively referred to Chapters 1, 5, 7, 15, 17, 
and 28 for informal hm regarding neutron cross-section measurements, the general 
principles of neutron transport, mathematics for nuclear engineering, multiphase 
flows, sensitivity and uncertainty analysis, and the scientific bus is for nuclear waste 
management. We note that an eBook version is available if hard copies cannot be 
obtained. 


2.5 Biological Models 

The role of predictive estimation for biological applications ranging from molec- 
ular dynamics to ecosystems lias burgeoned in the last 25 years and will become 
increasingly important as measurement techniques, models, numerical algorithms, 
computational architectures, and uncertainty quantification techniques continue to 
evolve. To focus the discussion of issues pertinent to predictive estimation, we 
consider the following biological scales, 

» Molecular* Molecular biology focuses oil the chemical compositions and in- 
teract ions required for living organisms. This ranges from fundamental i lives- 
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tigations of proteins and amino adds to understanding and cataloguing the 
hereditary and i n formation - carry I ng capabilities of DIM A and HNA. Signifi- 
cant recent advances have resulted from improved, measurement capabilities, 
the development of statistical and mathematical modeling techniques, such 
as dynamic models for proteins and hidden M arko v models for biological .se- 
quencing, and improved numerical algorithms and computational resources. 

* Cellular. Cellular biology broadly Focuses on self- replicating units such as 
viruses, bacteria, and animal and plant cells, along with their interactions 
and com mini i cat ion. The role of mathematical models has been established 
at this level for at least 30 years, due in part to the availability of relevant 
data. Models have successfully elucidated or helped to explain phenomena 
such as cellular kinetics, cell cycle dynamics, in formation processing and signal 
transduction in cells, tissue patterning, and cellular robustness and mutations. 


* Organisms, This biological level focuses on the properties of multicellular 
tissues, organs, and organ systems that comprise organisms such as animal 
and plant species. This area has grown significantly due to improved non- 
iiivasive diagnostic techniques and the complementary development of com- 
prehensive component and system models. Representative examples include 
coupled models for the heart and circulatory system and models supporting 
the development of synthetic organs such ss artificial hearts and kidneys. This 
level also includes the interaction of organisms with the environment and their 
breakdown when robustness fails. This includes autoimmune afflictions such 
ns rheumatoid arthritis and viral diseases such as the common cold, influenza 
si rain s , an d the \ nil nut ] i m m \ n km lei i ciei i c yy v i ms (HIV), 

■* Popu latinos, Population biology - focuses on the interactions, gen otic varia- 
tions, and disease spread among individuals of a single species. Deterministic 
and stochastic compartmental models have long played a significant role in 
quantifying processes such as disease dynamic and spread with more recent 
models focusing on the role of distributed attributes such as age or size. 


* Communities and Ecosystems. Populations almost never live in isola- 
tion, and the final level addresses the interactions between various species 
and their environment. This ranges from classical models, such as the Lotku 
and Volterra predator-prey models developed in 1925 1926 1 159, 258 j, to mod- 
els quantifying the anthropogenic impact on the climate and environment, as 


discussed in Sections 2-2 and 2-3, 


The delineation of biological processes into these scales facilitates our under- 
standing of the processes but is somewhat misleading in the sense that it neglects 
the coupling that intrinsically occurs between scales. Issues that complicate, but 
are er i t i ea ! k >r, r -> red t c : t 5 ve est i mati on me] i u 1c \ 1 i c fo] I < :■ w i : i g .. 


* Biological phenomena often occur over vast space- and timescales with events 
at one scale strongly affecting those at other scales, e,g., interactions between 
viral cell dynamics and a host organism. The development of comprehensive 
multiscale models requires significant data across these scales. 
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* Biological systems can include a very large number of components — e.g,, mil- 
lions or more — with complex interactions between objects* e-g** intramolecular 
DNA and RNA interactions. These interactions are often far from equilibrium 
or steady slate, and high-order interactions are often the rule. 


* The feedback mechanisms associated with biological phenomena are often 
highly complex, involve multiple coupled components, have long timescales, 
exhibit significant variability between individuals, arid are difficult to quantify 
using precise physical laws* e.g. . enzyme responses to metabolism ainl immune 
responses to viral infection. 


* The organization features of biological systems are typically highly complex 
and vary significantly among individuals. This introduces substantial un- 
certainty when bridging between individual and population properties, e.g., 
health trends slid) as obesity or disease dynamics. 


* There are often more parameters than data which prohibits the use of statisti- 
cal techniques developed for data- rich applications, e.g., modeling the disease 
dynamics of rare < liseases, 

* Data is often highly inaccurate, exhibits substantial noise, or contains signif- 
icant uncertainties, e.g., determining HIV occurrence in countries where the 
stigma of the disease deters accurate reporting. Furthermore, data is often 
in a- form (dial, differs significantly from the state variable's in models, e.g., 
photos of phenomena. Thi.s complicates model calibration and validation and 
can necessitate the development of new techniques for both. 


* Phenomena are often inherently discrete, which prohibits the use of calculus 
and requires techniques such as information science or combinatorics, e.g,, 
gene sequence analysis. 

* Models for biological phenomena are often highly complex and have numerous 
inputs or parameters that cannot be measured directly but rather must be 
inferred 1:1 trough fits to data; e.g., see the HIV model discussed in Section 2.6.1 . 
These parameters are often highly correlated and may not be identifiable. 

* Models must balance accuracy with efficiency and utility to be useful for di- 

i-.i i;> -i.i'. gmdii.'j. iVffituji'Ut jiimhi--;-. p^Licics. - v gotji-ral comrn’ dc- iyi:. 
Hence model discrepancy must be treated in a computationally tractable man- 
ner, e.g., models used to determine future influenza vaccines, HIV treatment 
schedules, or policies concerning anthropogenic environmental impact. 

* The uncertainty in biological processes and variability among individuals of- 
ten necessitates the use of stochastic forces or models to quantify uncertain 
processes, e.g.. gene, bacteria, or viral evolution. 

It is clear that incorporation and accommodation of uncertainties inherent to 
data, models* inputs, and numerical algorithms is critical to the success of predictive 
estimation for complex biological and biomedical applications. Furthermore, as 
illustrated in Figure 1.1, the design of calibration and validation experiments must 
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be tightly integrated with model development and numerical simulation to optimize 
the impact of predictive estimation for biological applications, 

To illustrate the role of uncertainty quantification for predictive estimation in 
the context of disease dynamics* we summarize next ail HIV model used to provide 
insights into the biological pnx:*sssos associated with this disease and to guide the 
establishment of treatment regimes. 


2.5.1 HIV Model 

The focus on HIV in the media has dwindled over the last decade due, in part, 
l.o the success of antiretroviral treatment regimes for slowing the progression of 
the disease and delaying the onslaught of acquired immune deficiency syndrome 
(AIDS), However, it is estimat ed that approximately 0,5% of the world’s population 
is presently living with the HIV infection [41] and the percentage o r hiv pi jmI i ve 
individuals in the United States is increasing, Hence the development of data-driven 
models with reduced and quantified uncertainties is critical for understanding the 
disease, reducing its spread, and optimizing treatment strategics. 

The scope of models ranges from the cellular to the population level with goals 
ranging from quantifying properties of the virus to predicting the disease spread in 
various cult Lives. Wo ill usi. rate the role of uncertainty quantification in a model used 
to quantify aspects of H i V progression in an individual. An initial version of this 
model was proposed in [55], and it was extended significantly in [2, 3] to provide a 
framework for investigating facets of optimal treatment protocols for HIV. 

As illustrated in Figure 2.18, the model is comprised of seven dynamic com- 
partments where each compartment represents a specific cell concentration through- 
out the body. HereTi and T% denote uninfected typo 1 and type 2 target cells which 
could respectively represent., for example, CD 4 T- lymphocytes and macrophages. 
Infected target cells are denoted by and 77 . Infectious and uoninfectious free 



Uninfected Infectious Infected Non- infectious Immune Effectcs 
Target Cells Virus Target Cells Virus (CTLs) 

Figure 2 *18+ Comportments in the HIV model modified from [3]. 
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viruses ale denoted by V/ and V/y/> and immune o[ feel or rolls are denoted by E. We 
note that 7 l5 Tj\ T 2l T^, and E have magnitudes on the order of oelUi/jil, whereas 
Vf and VjV/ have scales of cells/ ml. The wide variation in scales exhibited by states 
is common in physical and biological ODE models and motivates the use of loga- 
rithmic scales in the manner detailed in Section 3-2. 

TJnmfected cells T{ become infected, with rates k * , when they encounter in- 
fectious free virus V / , The treatment factor 6i(i) models a reverse transcriptase 
inhibitor (HTI) that blocks new infections. This is potentially more effective for 
type 1 target cells than for type 2, where the efficacy is represented by fi\ (f) with 
f Q [0,1], Natural source and death rates for the two types of cells arc represented 
by Xi and dj- Free viruses are produced by infected cells, and they leave the com- 
part merit by infecting cells or via natural death at rate c. Finally, the immune 
effectors E arc produced in response to existing immune effectors and t he presence 
of infected cells. 

As detailed in |8‘_ the resulting ODE model is 


(2.20) 


f, — A i - d,T, - [l-E } {t)]k } V:T y , 
f 2 X 2 -d 2 T 2 -[l-f Sl {t)]k 2 V I T 2 , 

t; - [I - ei(01*iV/Ti - 57? - n h ET[\ 
f* = [1 - f£) {t)\\k 2 V } T 2 - 6T.; - m?£Tt. 

V, = [1 - c 2 (*)]10 3 W t £(T* 4- TZ) - cV, 

- [(1 — £■] (f)]/5il0' ! fc]T] V/ I [l — 

v Nf - + TZ) - cV NIi 

* x , MJY \ T.n d E (T{ I TZ) 

E + (T; t- TZ ) + K h (J\“ + T*) + K d E ' 


where the remaining parameters are defined in Table 2d- The factors of 10 J convert 
between microliter and m ill it iter scales. 

For the experiments described in [3j, data consisted of measurements of the 
total viral load V — Vj + Vj/j and total CD-I ' T-lymphocyte counts Tj + T* . In 
the control problem, inputs consist of the H IT treatment factor S| (f j and tiie action 
Saftf) of a protease inhibitor which causes infected cells to produce noninfectious 
viruses Vjy/. The control objective is to reduce the viral load V" using reasonable 
levels of ei(t) and ^(f). 


2.5,2 Source and Role of Uncertainties 

Measurement Errors 

As detailed in 3 „ viral loads V = Vj i V'v/ were quantified using reverse 
transcriptase- polymerase chain reaction (RT-PCR) techniques which have a linear 
range of 400 to 750,000 copies/ ml for the standard assay and 50 to 100,000 copies/ ml 
for the ultrasensitive assay. Fbr samples with viral loads that exceeded the upper 
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Aj 

Target cell 1 production rate 

Pi 

Ave., virions infecting type 1 cell 

A 2 

Target cell 2 production rate 

P2 

Ave, virions infecting type 2 cell 

d, 

Target cell 1 death rate 

bv 

Max, birth rate immune effectors 

da 

Target cell 2 death rate 

d-E 

Max, death rate immune 1 effectors 


Population 1 infection rate 

Kj f 

Birth constant, immune effectors 

k 2 

Population 2 infection rate 

K d 

Death constant, immune effectors 

c 

Virus natural death rate 


Immune effector production rate 

6 

Infected cell death rate 

<>E 

Natural death rate, immune effectors 

c 

1 J op 1 1 1 at i 0 1 1 1 treat in ent, e hi caey 

A 7’ 

Virions produced per infected ceil 

mi 

Population 1 clearance rate 

/ 

Treatment efficacy reduction 

ai2 

Population 2 clearance rate 




Table 2,1, Parameters 

■used 

in the HIV model (2.20), 


limit, the sample was diluted until the virus numbers were within l lie prescribed 
range and the measurements were scaled accordingly. Whereas this technique ac- 
commodated large viral loads, it is also a source of measurement errors. For viral 
loads below 50 copies/ ml, this value was prescribed as a lower limit, as shown in 
Figure 2 . 10 . Left- censoring of the data in this manner introduces unavoidable mea- 
surement error and must be accommodated when estimating parameters through a 
least squares fit. to data. 

M ode! and Input Uncertainties 


The model (2.20} is a vastly simplified representation of components in an 
extremely complex, and only partially understood, process. Hence its scope and 
accuracy arc intrinsically limited ;md it is unreasonable to expect it to provide 
direct answers regarding the pathogenesis of the HIV infection. When employed in 
conjunction with data, however, it can be used to elucidate properties of the viral 


transmission mechanisms and guide the development of control-based treatment 
regimes based on the reduction of the viral load. 

It is noted in [2, 3] that whereas certain parameters, such as fig, c, Nj-.S. d ] , 
and db, can be estimated from human or macaque data, the remainder must be 



0 £C0 400 SCO 300 100 0 1200 14no 1600 1500 

time (day 5) 


Figure 2*19* Left- censored vim! data from [3] 
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inferred through a fil to data. Furthermore, initial conditions for Hie slates are 
generally unknown, so they too must he estimated. Due to the measurement errors 
and model discrepancy terms, these input terms will also have inherent uncertainty 
which must be quantified using the techniques of Chapters 7 and 8. 

2.5.3 Robust Control Design 

The objective of using the model to guide treatment regimes can be posed as a con- 
trol problem in which one seeks to regulate the viral load V to zero using the RTf 
treatment factor and protease inhibitor level c^(^) as control inputs. In [2 . 

linear quadratic regulator f LQR) theory was used to construct continuous and struc- 
tured treatment interaction strategics. Whereas LQR theory employs feedback to 
provide a degree of robustness with regard to state uncertainty, this control method 
is not explicitly designed to provide optimal robustness for uncertain systems. 

Alternatively, one can consider the average viral load at time i to be the 
OeJ with uncertainties quantified using the techniques detailed in Chapters 9 and 
10. Th .ese quantified uncertainties can then be used to construct optimal gains for 
robust control techniques such us sliding mode or //- c . control designs. This use of 
in i certainly quantification For improved robust control design constitutes an active 
research area. 


2.5.4 References 

The monograph [181] summarizes the status in 2005 and future directions rec- 
ommended by the DOE Computational Biology Committee on Mathematical Sci- 
ences Research regarding research at the interface between mathematics and biol- 
ogy. Specifically, it delineates ways m which mathematics has impacted molecular, 
cellular, organism al, population, and community and ecosystem biology and rccom- 

i: ■ i ■ ! i- L I i : ■ i i . « is prcqi'O I i- ■ ' mv- ■ : kiipjirl . The :vii- lor is ro f. -r n 1 1 « I I 

for details regarding models at all of these levels., disease models, and control strate- 
gics for various diseases in 1995. Mathematical modeling in biology, biomedical, and 



voted to mathematical models For communicable diseases. Details regarding models 
quantifying aspects of If IV pathogenesis can be found in [2, 3, 19, 55, 191], whereas 
models lor the transmission dynamics of HIV are provided in [40] and the references 
therein. 




Chapter 3 


Prototypical Models 


3.1 Models 

The equations of atmospheric physics (2.7), groundwater flow {2, 12}. and thermal- 
hydraulic flow (2,16) and (2.17) are nonlinear and couple complex multiphysics 
systems, Similarly, structural and material applications require 2-D and 3-D shell 
and continuum models, whereas biological and biomedical systems are routinely 
quantified using coupled, nonlinear ODE and PDE models. The development of 
uncertainty quantification techniques for applications requiring this level of model- 
ing is critical, and facets of current research are focuse< i on techniques to address the 
complexities of high-dimensional, coupled, nonlinear ODE and PDE models with 
large numbers of parameters. This includes the construction of surrogate models, 
as detailed in Chapter 13. 

To illustrate basic techniques, however, we consider a suite of simpler models, 
many of which have analytic solutions. Fundamental issues required to extend the 
methodologies to significantly more complex problems are noted where relevant. 


Example 3.1 (Exponential Processes). 


We consider first the initial value prob- 


lem 



(3.1) 



* 0 : 


which models the exponential growth or decay of a process with forcing term 6(f), 
We collectively denote parameters by q = [ft, so]- The solution 




.at 


z<} + 


e-**b(s)d# 



L ^0 J 

depends on both the independent variable t and the parameters q. It is impor- 
tant to note that although the model (3.1) exhibits a linear dependence on the 
state or dependent variable z , the dependence of z(t,q) on the parameters is highly 


nonlinear. 
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Despite the simplicity of (3,1), exponential processes play a significant role in 
the applications of Chapter 2, including radioactive decay and cross-section relations 
for nuclear processes and certain chemical reactions for atmospheric species. 

As detailed in Chapter 1, 1 he uncertainty quantification problem has two com- 
ponents: determine uncertainties in the parameters q and forcing toiiii i(£) and 
quantify the effect of these uncertainties oil the state response z. The first com- 
ponent sh addressed in Chapters 7 and whereas Chapters 9 and 10 address the 
second. 

To incorporate these random effects, it is illustrated in Chapters 4 and 5 that 
we can consider o(w) and zy (w) to be random variables where w is an event in an 
underlying probability space. Similarly, we can consider 6(f , w) and z(t, w) to be 
random processes in the manner detailed in Section 4.5 with the added assumption 
that b is continuous in i for each realisation of w. The resulting model is the random 
differential equation 

— = a(oj)z f b(t, w), 

dt ( 3 . 3 ) 

™(G) =■ 

whicli lias the solution 





ft 

"1 

*o(w) -1- / * 

jj .-S, w)dfl 

f 0 



(3.4) 


We note that sample paths specified by (3.41 are differentiable functions of i. As 
1 let Mil'll in Sr:-: ■■ m 1.7 lis Iciidair'iii allv dehnontes roni lorn ■ liJt'T- mini jinn intis 
from stochastic differential equations which are not differentiable hi t ime and require 
ltd calculus for correct fornuilation and solution. 


Example 3+2 (Simple Harmonic Oscillator), The simple harmonic oscillator 
model 

d 2 z dz 


m— I c— H kz = fncos{upt}, 
at z at 

..(G) - , §(0) - 


(3.5) 


is a prototype for various damped, periodic processes. Here m > 0. e > U. and 
k > 0 respectively denote the mass, damping, and stiffness coefficients and wj? is 
the driving frequency. 

The solution of (3.5) is 


z(t) = c l c 1 ' 1 + 


fo 


x /m-(w5 u^.)- f c-C 7.. 


==f CO <s(u F t - ( ! i) 


(3.6) 


where the natural frequency is wo ■ 6 is the solution to 


CDS 6 = 


m(w£ — 
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and 


Fi .2 - 


— c ± \/ c 2 — 4 km 
2m 


ft re the roots of the* characteristic equation. The constants iq hi nl c.2 are determined 
by the initial conditions, 

Since m > I!. {' — - 0 ns f — ? \ . ■ '. iv Tnion 1 i ■ 1 1 : 1 r- u 1 1 l ; ■ j ■ : ■ ]■ i « :■ lie 

tion 

fo 

r t ■. 

•v 


(0 ■ ; -J 


V m 2 (W(- uij ,) 2 I c 2 oj- f 


— cos (w pi — 5), 


(3.7) 


tilt: maximum value of which is 

2o = 


/n 


v /m 2 (^ - 4 t?!Ai-p 


(3.8) 


Til Chapter U, we use (3.3) to illustrate the effects of nonlinear behavior near reso- 
nance, Further details regarding the use of this model to illustrate concepts from 
uncertainty quantification and statistical model validation can be found in 1II|, 

A second case we will consider is the unforced problem /q = 0. For the case 
of imderdampcd motion, c 2 — 4 km < 0, the solution can bo expressed as 


z(t) - e -W am)t 


\/4fii.k — r? \ f \/ 4 mk — t: 2 

t"i or is I ■ M ■ h in I r ■ t 


2 m 


2 m 


(3.9) 


where 


c 


0' | — £0 


C2 


0 + 


dm 


y"4 fnk — r 2 y/ 4 Tft.k — r~ 




(3.10) 


For the model (3.5), m, c, k, /o, and z\ may be uncertain inputs and hence 
can be considered as random variables. This will yield a random solution ^(i,u?) 
whose distribution we wish to quantify based on the densities for the inputs. 

When estimating and constructing densities for parameters, it is critical that 
parameters be identifiable in the sense that they can be uniquely determined from 
observations, One can easily see that this is not the case for (3.5) since the parameter 
sets q — [m : c, fc>/o3 and q= [1 5 ^j-] yield the same state values c(f). Hence 

we also employ the formulation 


d 2 



, c* + r. 


F\j cos (w/rt). 



d } Z , L 


(3.1 i) 


where CJ - - K ■ = — , and iT = ■ “■ , The forced and unforced solutions to (3.11) 
can lie easily obtained from (3,3). (3.7), and (3,9) through the substitutions c — > C, 
k —> A , ftj — > ivi , and m — r 1 , 

In general f we can measure only a subset of the states in a model. For ex- 
ample, a proximity sensor can be used to measure displacements z, whereas a laser 
vibrometer provides* velocity measurements. However, it is unlikely that both would 
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be used to measure the full state u — [sr , z] T . Displacement or velocity observations 
can be represented by 

y = C T u, (3.12) 

where C' r = (l , 0] or C ! = [0. l]. 

However j other applications require more general responses or Qol, such as 


V = 




where 7(f) is a weight or filter over the time interval of interest. The full set- of 
parameters in this case would be ^ — [(7, h\ 7(^)1, 

To provide a framework that includes both {3, 12) and (3.13). we denote ob- 
servations or responses by 

y = n(u,q}. (3.14) 

It will be illustrated in yection 3.3 that 71 can be interpreted as a functional or 
operator defined on an appropriate Hilbert space. 

The role of 72 is analogous to that of / in (1,4), In (3,14), we highlight the 
fai l that we are observing the state u by including it as an argument of 77.. The 
role of u in (1,4) is not directly considered, and the output is- instead formulated in 
teni is of the inputs */ and independent variables x- 


Example 3.3 (HIV Model)- To illustrate aspects of uncertainty quantification 
for a nonlinear system of ODEs arising in a biomedical application, we employ the 
model 


t , = A ! - diT \ -(1 -z jkiVTu 
t 2 A 2 - d 2 T 2 - (1 - fs)k 2 VT 2 , 


77 - (3 - eJ^iVTi - ST{ - mi ETf 



(1 - f£)k*VT 2 ~ - m y E77. 


V = N t S(T* i T*)- cV - [(1 - f)pifeiT! I 

E=x 1 W '■ Tj) mt; - T-) 
h 77 I T* ■> K b Tj* -l- T* 4 Kti 


(l f V) P'l A~2 TV ] V" : 

E - S F jE. 


(»■ 15) 


Tli is is a slightly simplified version of (2.20) in which we neglect the contributions of 
11 1 1 i 1 1 f ; 1 ; ■ i. : v 1 1 virus V\ 1 an 1 pi- -ivr mv : nl lil :ii ■ r ' ; . A- del ailed : 11 Suuu.uii 2 . 7 . I . 7 1 
and 7 7 represent the populations of uninfected and infected T- ly n lphocy tes , 1 \ and 
T -2 are corresponding macrophage populations, and V t E denote the populations of 
free virus and immune effector cells, 

TJ le physical interpretation of the parameters is detailed in Table 2.1> and the 
values compiled in Table 3.1 arc used in the examples of Chapters 3 and 9, The 
origin and units for these values are given in Table 1 of [21 , The initial conditions 
for subsequent examples are taken to be 


Tj = 0.9 X 10* , T-i - 4000 , 77 = 0-1 . T£ - 0.1 . V - 1 
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A| 

1 x 10 4 

d i 

0.01 

£ - 0 

k t - S x 10“ r 

h'2 

- 31.98 

^2 = 

0.01 

/ = 0.34 

h 2 = lx lO - * 1 

5 = 

0.7 

nq 

1 x J 1 1— 13 

m 2 = 1 X 10~ r 

A’-, = 100 

c = 

13 

Pi = 

1 

Pi — 1 

Ajj = 1 

1>E 

&E 

0.3 

= 0.1 

A h 

ion 

d E - 0.25 

Kti 500 


Table 3. t « 

1 7 arum £ it*, r 

values from Tahf.v 1 

Of [21 


Readers are referred to 2] for details regarding the use of this model to guide the 
development of treat merit strategies for IlfV. 


Example 3.4 (STT1 Model). Fundamental aspects of disease dynamics arc quan- 
tified by the SIR model 

, S(Q) = S 0t 

■ /( 0) - / 0t (3d 7) 

, R{ 0) = J? Lh 

where S(i), /(£), and /(!(/■) are the number of susceptible, infectious, and recovered 
individuals in a population of si^e A. Here 7, Ay and r respectively denote the 
infection coefficient, the interaction coefficient, which quantities the probability that 
.nil mdhulunl c uiin. 1 .- :n ccuiiatl wills ulli-Ts. rmd [he reei e - t rule. The birth an 1 
death rates are assumed to equal with both denoted by 6, It is observed that 

dS dl dR 
& + * f dT * 0 

so that the total population 5 ! {£) ■ i(f) f /v ( t ) A r is constant. 


dS 

~di 

d!_ 

df 

dR 

~dt 


dN - 5S - jkIS 
ykIS — (r + S)I 
rl - dR 


Example 3,5 (float Equation). The first law of thermodynamics and conser- 
vation of energy were used to quantify heal conduction and convection in the at- 
1 : . 1 1 | i : 1 r ■ 1 ■ i : ■ |ii!ii i- -i i-. (2.7) s i : i l L 1 1 s = ■:■ 1 1 1 : i L kvdrnulie n hi'. 5 hi (2.17) used In model Tie 
flow and beating of coolant in a nuclear reactor. The following time- dependent 
and steady state heat equations encapsulate the prototypical behavior of both heat 
co ] iduct \ on ai id di flf nsi on pro ceases . 

For the experimental configuration, which wc will consider in Chapters 7 and 8 
when quantifying parameter uncertainty, wc consider copper and aluminum rectan- 
gular, insulated rods with cross-sectional dimensions a = b = 0.05 cm and length 
L = 711 cm. A beat source provides a fixed, but unknown „ heat flux at x = I). As 
det a i led in 1 26 1 , an et lergy 1 > a ) a n cc. * y ields the model 


[X'p 


0T 0 { .0T\ 2 (a \ b)h 


dt 8x \ dx 


ab 


- T(Lx) - Tamb} - 0 <&< L, 


(3.18) 
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and boundary condition* 

dT 


dT 


k—it o) * t k— (t, l) = F fcjr™* - r,(t /-)] 


(3. ID) 


da: ' di- 

Here 7\ ^ c ff , k) h, and respectively denote the temperature, density, specific 

heat, thermal conductivity, convective heat transfer coefficient, anti ambient room 
temperature. Initial cone fit. km* are spec hied as 


T{Q,x) T 0 (ar). 


(3. 2D) 


In experiments, temperatures are allowed to equilibrate to a steady state 
which is initially modeled by the boundary value problem 


d-"f K 2{a + b)h 

1 


ab 


dx 2 
7^(0) - j 


~(L)- - T,{L)l 


0-21) 


It is clear that the three parameter* h, k, and $ are not- uniquely defined since (3.21) 
caii be re form aimed in terms of two parameters h = ^ and $ = Because the 
thermal conductivity k is known for aluminum and copper > we respectively use the 
values k = 2.37 . . . and k = -1.01 1 , . when modeling the aluminum and copper 

rods, These values lie within the ranges 2,04-2-50 and 3.53* 4.0] ^ reported 

for aluminum and copper. The source heat flux <5 and convective heat transfer 
Coefficient h are unknown, so the parameter *et. to be estimated and statistically 
analyzed is q [*, ft]. 

The solution of (3,21) is 


q) C[ (q)e ^ 4 Co (rj)e : 1 +7 ambi 


(3.22) 


where 7 = y and 


c l (tf) - 




k 


Z'Y 


e"< L {h + ky) 


e~ yL (h — k^f) + e^ L (h T k'y ) | 


c s(^) I ^JL (<?} (3.23) 

^7 


We suppress the parameter dependence of -7 to clarify the notation. Observations for 
this experiment consist of temperature measurements at 15 equallv spaced spatial 
location* xj —= + (7 — l)A# t j =• 1 : . , 15, where £0 = 10 cm and Ax = 4 cm. 

Steady .state temperature data for rectangular, uninsulated aluminum and copper 
rods i* compiled in Tables 3.2 and 3,3. 

We develop and analyze an extension of this steady state model in Exam- 
ple 12,4 of Section 12.2. 

More generally, we will consider the model 


(3.24) 


(“<*>•§) + '<*’ a ' ) > 

r(f n — l) = TV , T(t, l) = T r . 

Tio.x) = TqOc), 
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x (cm) 

10 

14 

18 

22 26 

30 34 

38 

Temp (°C) 

90.14 

80.12 

67,06 

57.90 50.90 

44.84 39.75 

36 16 

x (cm) 

42 

16 

50 

54 58 

62 66 


To up (°C) 

33.31 

31.15 

29.28 

27.88 27.18 

26.40 25.86 



Table 3*2« Steady state temperatures measured at locations x for an aluminum rod. 

where ot(x) is a spatially varying thermal diffusivity and /(£, x) is a distributed 
source term, Uncor tail il-i.es in initial and boundary conditions Can be interpreted as 
random variables? whereas it is illustrated in Section 4.5 that a random field 
can be used to incorporate uncertainty in the spatially varying diffusivity. 


.i; (cm) 

10 

14 

18 

22 

26 

3D 

34 38 

Temp (°C) 

(Hi .04 

00.04 

54,81 

50.42 

46,74 

43.66 

40,76 38.49 

x (cm) 

42 

46 

50 

5-1 

58 

62 

66 

Temp (°C) 

36,12 

34.77 

33.18 

32,36 

31.56 

30.61 

30.56 


Table 3.3, Steady state temperatures measured at locations x for a copper rod. 


Example 3.6 (Neutron Diffusion). In Section 2.4,2, we summarized the neutron 
transport equations in the nbsenoo of diffusion. Steady state l I) neutron diffusion 
in a material of width 2a can be modeled as 


dV 

\ - _ n I_ 

rrv 


S 1 x £ (—a, n), 



where y\ 7), A a> and S respectively denote die neutron flux, a diffusion coefficient ? 
a macroscopic absorption cross- section, and a constant distributed source [50]. 
Whereas the notation is commonly employed in nuclear physics, we use A a 
to avoid confusion with summations, The flux is assumed to vanish at x = ±a, 


yielding the boundary conditions 


= 0 . 



As illustrated in Figure 3.1(a), the response is measured using a detector with area 
Ad located at x h so that 





*.=a 

x=b 

4 — i — i — i — i — * — | — i — \- 

b a 


x=-a 


lb) 


{a) 


Figure 3 . J k (a) Material geometry and (b) finite difference grid with h = 2 /Ad 
jY - 8. 
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The pa rai niters a n s q ! A a , D , S\ Ad ] , ai id tt i< ;* soli itio 1 1 to ( 3 . 25) wi tl i the b c m rid ary 

conditions (3.26) is 



/ _ eQali( j i 1 i:) \ 

V eosli(afc)/ 


, A: = -JAJD, 


(3.28) 


It was noted in Section 2,4.2 that cross-section measurements and uncertainties for 
numerous materials have been catalogued for the last 71) years. Depending on the 
application, uncertainties in D and S can be determined vising the techniques of 
Chapters 7 and 8. 

L’o discretize (3.25), we consider a mesh Xi - — a 4- ih, where It — and 

jV 8. aw shown in Figure 3.1(b), We assume that the observation location h is at 
| so that it coincides with . Note that this choice of N and b was made solely to 
permit the specific description of the observation functional, and general choices for 
N and i> arc equally valid. Finally, we let <£■,- ^ { fi x i ) denote approximate solutions 
so that the solution vector is — j^i, r , . 1 y^/y- ]|^\ 

Discretization of the second derivative term using a central difference Taylor 
approximation and enforcement of boundary conditions yields 


4 — f) 

^ a f ■ * 


<Pi + 1 ” 2^ t + *Pi - 1 


S 


i 1 , . . . , A — 1 


which can be formulated as 


li*A a + 2 D -D 

-!) h?A a + 21) -D 


-D h?Aa + 2D D 


-D h 2 A { , + 2D . 

A{q)$ = %)- 

The response (3.27) can be formulated 

V = c 1 (q)</> = C 1 iq)A 1 (r/).s(y) 


or 


ri 

tpN-2 
tfN- I 


h 2 S 

h 2 S 


h 2 S 

Ii~S 


(329) 


(3.30) 


(3.31) 


For N = 8 and h = ~. the observation vector is 

C T {q) = [0,0, 0,0,0, A d , 0,0]. (3.32) 

Example 3.7 (Beam Equation). Consider a thin cantilever beam driven by a 
voltage spike V r ()t) applied to a surface mounted piezoelectric actuator, as shown 
in Figure 3.2. The beam is damped at x = G, and the location of the patch is 
designated by We let 6-, h , and L respectively designate the width, thickness > 

and length of the beam and b py h p denote the patch width and thickness. Transverse 
beam displacements are denoted by u;(£, ;c), and f(t, x) denotes a general transverse 
force applied to the beam. 
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F igure 3.2. Thiu brum. dnvvn hy a $v j rjar.i-: - m nunt.wl piez&f > .l£cMc pu.f.rh with dis- 
placements yi measured at x — L2S mm. 


It is shown in [225] that: displacements can be modeled by the Euler— Bernoulli 
equation 


p( x )-qp _h T7 ji + 1 0 < x < l> t > 0 , 


, du 

uj(f,0) - - 


Ox 


= Q , M(t,L) ■- ■ - 


OM 


dx 


.L\ = Q 


t > 0, 


w(0,o:) - W<J (-c) , 


dw , 


Ot 


. x) w\ (ic) , 0 < x < L, 


where 7 is an air (toping coefficient and the moment is 

To accommodate t.bo differing geometry and material properties in the region cov- 
ered by the patch, the density, stiffness, and damping terms are given by the piece- 
wise oc ) 1 isLa r it rel at i 01 ls 

p(x) - phb I pphpbpX p {x) , YI{x) = YI H Y p I p x?{x)> 

cl(x) - cl + c p I p x P ( x )- 

Here />, p 0i V, Y.^ and c, c p are the density, Young's modulus, and Kelvin Voigt 
damping coefficients for the beam and patch, and the characteristic function 


1 . x £ [* 1 , 3 : 2 ], 

n . 


Xp(z) = 


isolates the region covered by the patch. The moment of inertia / and constant I p 

are given by 

1 = b / z 2 dz 
J -ft/2 

fh/2+h v 


h 3 b 

~\2' 


(3-35) 


b ’ h v 


h/2 


z ^dz = 


{h/2 + h P f - (h 


The force generated by the application of a small voltage V(£) to the patch can be 
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Component 

Geometry (mm) 

Material Properties 

Beam 

393 x 26 x 1.25 

Y - 69 GPa, p -2700 kg/ni 3 

FZT Actuator 

51 x 26 x 0,4 



Table 3,4, Geometry and material properties of the aluminum beam and QuickPack 
py fizoeleo ti "t t~: ar. hi a t.or. 


approximated by 

c)~ 

ox 

To avoid issues associated with different i at trig the piecewise constant charac- 
teristic function and t o facilitate subsequent numerical implementation, one employs 
the weak model formulation 
L 


= k Jt V(t) / 

i 



d 2 tv &v? 
' (,| ¥ 1 


o_- r 


d>dx 4 


"wS 1 


&*w 

dx 2 dt 


<b tT dx 


which must hold for all test functions <p € V r {© € H"(U, L) | tf(0) = p*(0) = U}. 

The geometry and material properties of the beam and QuickPack piezoelec- 
tric actuator are compiled in Table 3,4. and the patch location is X] =41 inm and 
X '2 — 02 mm. The values of p p and Y p are not well established since the QuickPack 
patch is a composite of a piezoelectric wafer embedded iri an epoxy and Kaplan ma- 
trix. Furthermore, the relations (3.34) and (3.35) neglect the glue layer. Hence we 
treat the linear density p p = p v h. p b P and stillness terms Yl }f = Y Jf I p and V b, = 1/ 
as parameter's that must, be estimated through h fit to data.. Similarly, the parame- 
ters Ch — cl j dp = Cplpj 7 , and kp must be estimated since there are no published 
values for these quantities. We fix pfc. = phb = U.0S77Q io preserve the identifmbility 
of the model- The parameter sel is thus 


Q — Pp‘, 7;^ ^ c/t, c/p. A-p] . 


(3.36) 


Observations consist of displacement measurements yi collected using a prox- 
imity sensor al the spatial location - 128 mm. The corresponding model response 
is thus y[tii q) - wfti^x^). 

Example 3.8 (Burgers 1 Equation)* The nonlinear equations (2.4) and (2, 1C), 
obtained by balancing momentum, quantity atmospheric and nuclear coolant flow 
processes. To provide a simple prototype^ we consider the viscous Burgers ’ equation 


&ti 


dv 




w(t T 0) — U£ , u(L 1) = u T , 


u(Q,x) - w q(x). 


(3.37) 
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which provides an example that has both nonlinear stale and parameter dependence. 
It can exhibit, uncertainty m the viscosity coefficient /i. the boundary terms u$ and 
u rt and the initial condition un(a d ). 


3.2 Evolution, Stationary, and Algebraic Models 

The models in Section 3.1 can be generally categorized as evolution processes > sta- 
tionary processes, or algebraic models. We summarize those general frameworks to 
provide the sotting that wo will employ for model calibration, uncertainty quantifi- 
cation. and. model validation. 


Evolution Processes 


The exponential model (3*l) t simple harmonic oscillator model (3.5). II IV 
model (3*15), and SIR model (3.17) are finite- dimensional evolution processes, 
whereas the heat, equations (3.18) and (3,24), beam equation (3,33), and Burgers' 
equation (3.37) are infinite-dimensional evolution models. All can be formulated as 


du 


= S (*,«(*) >3). 


fit 

u(i 0 ) = 1/0, 


(3.38) 


where q E ll£ J, '‘ is a vector of parameter values and u is the state. For ODE models 
or spatial discretizations of PDEs (e-g- f finite element, finite difference, or finite 
vo 1 11 me) t the state w.(£, 7 ) E IR jV is a finite-dimensional vector. More generally, otic 
can cons idor u in a suitable Hilbert or Banach space in which case (3.38) additionally 
represents evolutionary PDEs [78. 190], Since we are focusing on computational 
algorithms, we will consider only the finite- dimensional case in this chapter. 

In general, we cannot observe all of the states u and instead consider contin- 
uous or discrete-lime observations 


= C-u(t, q) 

or 

y(tj, q) — Cu(t j; q) , j— 1 n. (3.39) 

We consider C to be a u x iV matrix which yields u observe l.ioj is or responses y(£j n </). 
.Note that v and A r respectively designate the dimension of the response and the 
number of states. More general response relations are illustrated in (3.13) and 
Chapter 14- 

We note that in most mathematical models for physical and biological appli- 
cations. the solution u{t^q) will exhibit a nonlinear dependence on the parameters q 
r vi-n i hough 1 1 1 . ■ -ii- Mel mny bo lhiriir v-.il h reynvrl i ;■ i hr si al c a. f Tei::v d: >servii'. o ms 
y(L f/) also typically exhibit a nonlinear parameter dependence. 

As illustrated in Example 3.3, the individual slates in coupled biological and 
physical models can vary over several orders of magnitude. This can stall the effi- 
ciency of numerical integration and optimisation routines and produce unrealistic 
results if physically positive par ameters become negative due to roundoff errors. In 
worst case scenarios, roundoff errors due to discrepant magnitude differences can 
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si. op integration routines if they produce nonphysical growth or imaginary compo- 
nents. 

Many of these issues can be avoided by replacing (3.38) by a log-transformed 
system. If we let Hi = log 10 (ttj), this yields the system 


rill; 

nr 


10 


ln(lQ) 




(*> 


10 


fl(0 



I0 fi| ((o) = % 


(3.40) 


for ? = 1 , N r We employ transformed systems of this form in Chapters 7 and 8, 

iSf.nl, ioTinry Processes 

The boundary value problems (3.21 ) and (3.25), used to model steady state 
heat conduction and neutron diffusion, are examples of stationary processes, as are 
elliptic PDEs. Stationary processes have tlie form 


A r ( tj, q) F(q) , x E V. 
B(u 7 q) = G(q) , x E OV, 


{■SAl) 


where .AC is a linear or nonlinear differential operator, F denotes source terms, B 
and G denote boundary operators* and T? is a region in RCR 2 n or IRA The state u 
can be N - dimensional but, in our examples, we will consider w(:r ? q) € K ] . We also 
consider only the case when observations 


v (*j . f i) - u C x j > q) ( 3 ■ +2) 

are made at discrete points j ■ , , j I, . . . . * n. 

For the steady state heat equation (3.21), u T. v , q | c h. h\, and 


A (T #1 <j) 


£7\ _ 2(« + h) h 

ah k *’ 


,,, ^ 2(0 + 6)^ 
F(<j) = ; -Cmhb, 


id) k 


BiT^q ) - 


( IT, 
fix 

d'i 


~ , X = 0 ? 

L + tJ* j X — L 


C{q) - 


fix h- * 


— r fl 
k 1 ~ Iy ’ 

r, i 1 


The observations arc 


- T s {xj.q) 


f : i " >Xj + c^(q)e 7Ij 4- 7o m *, 


whore ci (</) and are defined in (3-23). For the experimental data 
model calibration in later chapters, the points xj are given in Table 3.2. 


for 


Algebraic Models 

Algebraic models arise when algebraic or polynomial relations are used to 
quantify physical or biological processes or result from the discretization of bound- 
ary value problems* as illustrated in Example 3.6. These models can be expressed as 

q) = Os 
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where jV is a nonlinear or linear operator, Two special cases arc 

A(q)n = F(q) , A(u)q = F(q), 

which represent .incar dependence on the state and parameters, 

3.3 Abstract Modeling Framework 

The models in Examples 3. 1-3.3 represent a range of algebraic relations, ODE*, 
and PDFs ihar i-xhihit 1 ■ • j 1 ) l -ini-ar ami u: l-: -m r ■-•l- i«- ■ U ■ : . -n- i- i». \V. ■ pr:iv[ilc 
here a general framework for the models that facilitate model calibration, propa- 
gation of uncertainties, and sensitivity analysis. Readers who are not familiar with 
functional analysis can skip this section without losing the physical understanding 
of the models, 


3.3.1 Linear Systems 

We consider first systems that are linear in the state, or dependent variable, but 
typically exhibit a nonlinear dependence on inputs and parameters, Systems of N 
coupled equations can be formulated as 

L(q)u = F(q(x)) , (3,43) 

where q{x) ' [viC*)= • • ••,'fr(x)l r a,e l,,e parameters, « - [?* ( (,v), ...,« J v(x)| 7 ' ift 

the state vector. L(q) L[(q). t . . . , L.^(q) ' is a vector of operators that depend 
linearly on the state u and typically nonlinear ly on the parameters q, and F(^) = 
[Fj (#), . , , , are source terms. Potential spatial or temporal dependence is 

indicated by x [at f ' E D x T = fi. : where D Ls a subset of IF£ . TK J + or IKA and T is a 
subset of M L . If L represents differential operators, appropriate initial or boundary 
conditions are represented in operator form a* 

B(q)u = G(q) . x € OQ, (3.44) 

where OQ is the boimdary of Q, 

The genera] obsCrvat ion Or response is rep resented by 

y = ft(t i,q) (3.45) 


in the space H = II u x H q . where II u and II ^ Q are Hilbert spaces for the state 
and parameters. Similarly, the sources F are assumed to he elements in the Hilbert 
space Hf. We note that in general, L will not be defined on all of H u but rather on 
a : L 1 -1 L-: ■ Sill :spri::' ■ I L ! J .V- il U: a ' : 'd il. ApprU'lix A. :'n] : li :J : |1 ial ■ :p- Tal « U*. 
donif/,) is often taken to be the subset of functions in a Sobolev space that satisfy 
b om idary condit i ons . 


Example 3*9. Consider the model (3.1) of Example 3.1 P The state is u in the 
Hi] E >ert space H,.. /A ((I tf) with tl le standard inner product. The parameters are 
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q = [a, « 0 j f J (0] - x £"(0- ff) /Jf = L~{0, t/), Since the independent variable 
is time., we have x f £ [0, tf]. With the operator definitions 

L(q)u = -J- - <*« , D{ q)u - u , F(q(x)) - Hi) , G{q) - «o, 

(it 

the model can be formulated in the general framework (3,43) -(3 ,44), 


Example 3,10, For the spring model of Example 3.2. appropriate operators are 


L(q)u — z I Cz I Kz . F(q) = F^cos(oj^t) , B(q) 



where u — z is considered in H„ — !. '({) At) and dom(h) c The parameter 
set is taken to be q = [C, K, Fo, so, z \ ] in the admissible parameter space Q = 
|0, oo ) x (0, oc) x R\ 

Alternatively, one can consider the state u = \ z, z in H ] (0, f j } \ £-(M/) wi th 
the operators 


L(ry) 


4 - -1 


K 


— I C 

rit { 


. ™ = 


0 

Fo cos(u,'i) 


B(q) -- 


1 0 

0 1 


G(q) 


z 0 


Whereas (his definition isolates flic action of the two slalom, we employ (3.46) in 
the sensitivity analysis of Chapter 14 since it facilitates construction of the adjoint 
operator. 


Example 3.11. To pose the heat, equation (3,24) in the framework (3. 43) -(3. 44), 
we take the parameters to be q — fa, uq, u T £ //'(—LI) x C 1 — 1,1] x R: 2 and 
note that x — |^i t € [—1,1] x [0, tj ■ We define the operators to be 


/.(■;)« - Tl 
B(q)u — u 


d f du 

o— 1 f F{q) = /, 


Of fj) - OlqUq Hh a.fU f . + OL r u t , 


where 


«□ = 


1, -1 < x < l t t= 0, 

0, else, 


1. x = - i, t> 0, 

h ct r 

0, else. 


1, x = 1, /• > 0, 
0, else. 


The regularity of the Hilbert space H u depends on the regularity of Hjr. For 
example, if it* = u r = 0 and / £ L 2 ((0, (/), // =1 (- 1 , 1)), where II 1 is the dual 
space of then u F /_. L ((0 n jtr), X), where X L 2 (— 1,1); e.g., see 2f) 1 207'. 

Exmtiplt* 3.12. For the boundary value problem (3.25) and response (3.27), the 
parameters are q = \A a , D, S, A.,j] £ E‘ l and x — x £ | — a. ft]. The operators are 

j2 

L(q)u = A a u - D*—-^ , F(q) = S , D[q) u = u , G(q) = 0, 
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and the response can be formulated aw 

y = 'R(:P, <]) = / Aw(s)6(a= - tydx. 

J —a 

Appropriate Hilbert, spaces are H v — Hp — L 2 (— a. a) with the inner product 

{ft 9 ) = I f(x)g(v)dx. 

J — ra 

Here dorn j c f/. u c;m be taken to he a) {0 £ H 1 (— a)|0(id) 0}. 

Example 3.1 3. The matrix system (3.30) can bo posed in the abstract, framework 
(3.43) by taking L — A, F = /, and u — 9 with H (l and taken to be the 
Euclidean space R '^ -1 with the standard dot product. The adjoint operator is the 

. r T“' 

matrix transpose L m — A . 


3-3.2 Nonlinear Systems 

For models such as Burgers 1 equation, illustrated in Example 3.8> the general frame- 
work must be extended to accommodate the nonlinearity in the dependent variable. 
In. 1 1 1 is t ;ase the model can 1 >e fori 1 ] u I i-i t vx 1 as 


A f («(x).g(x)) = .F(<?(x)) J X e -b (3-47) 

where A r = A r i{u,g) A r jv(w, q) is an A'- vector of nonlinear operators. The 

definitions of # 7 u, g, and F are the same as those employed in Section 3,3,1 for 
linear operators. Similarly, initial or boundary conditions are represented by 

B(u,q) = G(q) , x e dO, (3,48) 

We refer the reader to [50, 531 additional details and examples regarding the 
representation of nonlinear models In ibis framework. 


Example 3.14. For the viscous Burgers* equation (3.37)., the parameters are q 
|hoj mg, u r \ E C [G, Ij x JET and the operators are 


du &u 


F(q) = 0 t 


£?(*/, tf) = U t G(tf) = ■ 1 1 .vi-x !> t f JtfUfi 4 - 


where a-Q. a* > and are defined in a manner analogous to Example 3. J (J. 


3.4 Notation for Parameters and inputs 

The notation for parameters varies widely among disciplines. I 11 statistics, 9 is com- 
monly employed lo denote calibration panimeUtT's, whereas general parameters arc 




66 


Chapter 3. Prototypical Models 


commonly denoted by q and a in the mathematics and nuclear engineering liter a- 
i ure. Numerous other conventions are employed in other .sciences and engineering. 

We use the mathematics notation q >r two reasons. The first is that it per- 
mits significant flexibility when discussing the associated random variables Q and 
admissible parameter spaces Q and Q. Second, it is not limited to solely repre- 
senting calibration parameters as is the case for 0 in the statistics literature. This 
is important since we are often interested in propagating uncertainties associated 
with both calibration parameters,, winch are determined using the statistical tech- 
niques of Chapters 7 and 8. and physical parameters. whose uncertainties may be 
determined from prior experiments. We do not differentiate between calibration 
and physical parameters since the designation is typically indicated by the context. 
Whereas we use q to denote this combined parameter set. we warn readers that the 
notation in their respective disciplines will often differ. 

As indicated in Definition 1.2, the term inputs is used to designate parameters, 
initial conditions, boundary conditions, or exogenous forces that exhibit uncertain- 
ties which must be determined and propagated through models to quantify response 
uncertainties. Once the techniques of Chapter 6 have been employed to represent 
random inputs, we often treat the terms inputs and parameters as synonymous. 


3.5 Exercises 


Exerci.se 3.1* Consider the heat equation (3.24) with constant diffusivity a and 
f(Lx) = Ti = Tr = 0, For N H- 1 spatial gridpoints and temporal stepsize fc, 
define the grid (a:*, t . ? -) t where x 7 -_ - - 1 h ih. h = ^ for i = = Q y . . . , JV, and tj = jk. 

Approximate solutions at gridpoints are denoted by T{j fts T(xi , tj). Use a central 
difference Taylor approximation in space and forward difference in time to obtain 
the discrete relation 


k 


= a 


T i+u- 2T i,j + T i-u 




or 


‘71^41 — (1 — 2 A )Ti s j T A (T J+ 1 n -h T*. i j), 


(3.19) 


where A = The initial conditions are T^o = Tbfcci) and the boundary conditions 
y i< h, 1 jriij — Tv.-i — 0 . The r el ation (3, 49 ) can b* ■ ex pressed as 

T j} 1 = AT J = A'^ l T\ (3.50) 

where T J = [Thj, , Tjv ] . r and 


A = 


1 - 2A A 
A 1 - 2A 


A 


A 

1 - 2A 
A 


A 

1 - 2A 


(3-51) 


For applications in which the initial heat distribution Th(:r) is unknown and 
treated as a random held. (3.50) provides a discretized form of the problem that 
depends linearly on the random vector T' :j , 




Chapter 4 

Fundamentals of 
Probability, Random 
Processes, and Statistics 


We summarize m this chapter those aspects of probability , random processes, and 
statistics that are employed in subsequent chapters. The discussion is necessarily 
brief, and additional details can be found in the references cited m the text and 
noted in Section 4.9. 

4.1 Random Variables, Distributions, and Densities 

When constructing statistical models for physical and biological processes, we will 
consider parameters and measurement errors k> be random variables whose statis- 
tical properties or distributions we wish to infer using measured data. The classical 
probability space provides the basis for defining and illustrating these concepts. 

Definition 4.1 (Probability Space). A probability space is comprised 

of three components: 

ft: sample space is the set of all possible outcomes from an experiment; 

T\ c- field of subsets of 0 that contains all events of interest; 

P : J‘ — > 0,1]: probability or measure that sal isfies the postulates 

(i) / J {0) = O, 

(ii) P(Si) = 1, 

(iii) if X E F an-: l .4, n Aj = 0, then P (|J“i Ai) = E“i 

We note that the concept of probability depends on whether one is considering 
a frequent ist (classical) or Bayesian perspective. In the frequent ist view, probabil- 
ities are defined as tlie frequency with which ail event occurs if the experiment is 
repeated a large number of times. The Bayesian perspective treats probabilities as a 
distribution of subjective values, rather than a single frequency, that are constructed 
or updated as data is observed. 
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Example -1,2, Consider an experiment, in which we flip two individual coins (e.g., 
a quarter and nickel) multiple times and record the outcome which consists of an 
ordered pair. The sample space and tr-field of events are thus 

52 - {(ff. H), (T\ H), {H t 7), (T, T)} ( 

7 = { 0 , (ff, H), (T, H }, (H i T), (T,T) t II, {(fl.T), (T, IT ), . . .}}. 4 ' 


Note that F contains all countable intersections and unions of elements in ft. If we 
flip the pair twice, two possible events are 

.4 = ( 7 \ H)} . = 

For fair coins, the frequentist perspective yields the probabilities 

P(A) = i , P{B) = 1 , P( 4 n B) = 1 , P(A U B) s d 

We note that because the events are independent * F(4 n £5 ) — P(A)P(B) r We will 
revisit the probabilities associated with flipping a coin from the Bayesian perspective 
in Example 4.66 of Section 4,8.2. 

We now define univariate random variables, distributions , and densities. 

4.1.1 Univariate Concepts 

Definition 4,3 (Random Variable). A random variable is a function X : ft — > R 
with the property that {t^ g H|X"((j) < x} E J- for each x E It; Le it is measurable, 
A random variable is said to be discrete if it takes values in a countable subset 
{xi, . . .} of R. 

Definition 4.4 (Realization)- I’he value 

x = X(u) 

of a random variable X for an event w € ft is termed a realization of X . 

We note that in the statistics literature, many authors employ the same nota- 
tion for the random variable and realization and let the context dictate the meaning. 
For those who are new to the field, this can obscure the meaning and, to the degree 
possible, we will use different notation for random variables and their realizations. 

Definition 4.5 (Cumulative Distribution Function). Associated with every 
random variable X is a cumulative distribution function (cdf) Fx : M — > [U. 1] given 
by 

Fv(s) = P{i*J G < a:} . (4.2) 

This is often expressed as Fx(x) — P{ X < x}^ which should be interpreted 
in the sense of (4.2), The following example illustrates the construction of a cdf for 
a discrete random variable. 
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"Example 4*6* Consider the experiment of Example 4,2 in which our event w con- 
sists of a single flip of a pair of coins. We define X (lj) to be the number of heads 
associated with the event so that 

X{H,H) = 2, 

X{H. T) = X {T, H)= L 

x(i\ r r) = O'* 


For x < 0, the probability of finding an event u) e £ J such that X (w) < x is 0, so 
Fx (x) = 0 for x < G. Similar analysis yields the cdf relation 


Fx(F}= { 


a 

1/4 

3/4 

1 


x < U, 

0 < < 1 , 

1 < x < 2 , 


which is plotted in Figure 1.1. 


It is observed that, by construction ? the cdf satisfies the properties 

(i) lim F x {x) = 0 , 

x —5—00 

(ii) Xi < $2 Fx(*i) < Fxixi), (4.3) 

(iii) lim Fx {&) — 1 - 

X — > co 


This is an example of a cadlag (French “continue a dr cite, limite a gauche" ) function 
that is ri^ht-contiiiuous and has left limits everywhere. These functions also arise 
in stochastic processes that admit jumps. 

For continuous and discrete random variables the probability density function 
and probability mass function are defined ras follows. 


Definition 4.7 (Probability Density Function). The random variable X is 
continuous if its cdf is absolutely continuous and hence can be expressed as 


Fx {x) = / fx (s)ds , X € 




where the derivative fx 1 777 1 mapping R to [ 0 , 00 ) is called the probability density 

function (pdf) of X. 



Figure 4.1. Cdf for Exatnple 4.0. 


70 


Chapter 4. Fundaments s of F J robabil i ty t Random Processes, and Statistics 


Definiticm 4,8 (Probability Mass Function). The probability mass function of 
& discrete random variable X is given by fx( x ) = -P( A = x). 


The pdf properties 


Fx( ^) ~ Fx(xi) = [ * fx(x)dx 

■ t i j 

fo I low immediately froi i ) t he defit i it ioi i ant I (4.3). r The attri I mtes o f dei isity fs m ct. i one 
can be further specified by designating their location or centrality > their spread or 
variability, their symmetry, and the contribution of tail behavior. In general, this 
in foil nation is provided by moments 


(i) fx(x) > 0, 

(ii) j fx'F)dx - 1 , 

(isi) P{x\ < X < * 2 ) = 


E(A T/ ) 


f x \x)dx 


or central moments. For ex ample } the mean 


fi = E(X) = / zfx(x)dx t 

Jut 

also termed the first moment or expected value, provides a measure of the density's 
central location, whereas the second central moment 


<7 2 - var(X) - E\(X - **) 2 1 - f (x - pCffx (x)dx (4,4) 

J E 

provides a measure of the density^ vaviabi.itv or width. This typically is termed 
the variance of A", and c is called the standard deviation. One often employs the 
relation 

<t 2 = E(X*)-h\ 

which results directly from (4.4), We note that the third moment (skewness) quan- 
tifies the density's symmetry about fi, whereas the fourth moment (kurtosis) quan- 
tifies the magnitude of tail contribut ions. 


important Distributions for Inference and Model Calibration 

We summarize next properties of the univariate normal, uniform, chi-squared, 
Student's i. beta., gamma, in verse-gamma, and inverse eh ii -squared distributions 
which are important for frequentist and Bayesian inference and model calibration. 


Definition 4.9 (Normal Distribution), hi uncertainty quantification, a com- 
monly employed univariate density is the normal density 


fx(z) 


1 e -(x-tif/2c i 

ff v' TiT 


, — OO < X < oo. 
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'/he associated cdf is 


i-z 


Fx( x ) - I /(«)^ = - 
J—Q O ^ 


4- erf 


; t : — fi 

<jV2 


where the error function is defined to be 


2 r ^ 

erf (it) — — / e * ds. 

Jo 


The notat ion X — <7 2 ) indicates that the random variable X is norm ally 

distributed with mean and variance ff~. For the normal density, 08.29% of the area 
is within Iff of the mean pt and 9-5.45% is. within 2(7. as illustrated in Figure 4.2(a). 


Definition 4-10 (Continuous Uniform Distribution). A random variable X 
is uniformly distributed on the interval [a, ft], denoted by X ^ U(a, b), if any value 
in the interval is achieved with equal probability. The pdf and cdf are thus 


I x (■(■) = *>.(,](#) 

o — a 



1: I 


F x( x ) 



x < a f 
a < x < b, 
X 5 b , 



where the characteristic function Xu.fcK 1 ) is defined to be unity on the interval [«,fc] 
and 0 elsewhere. The pdf is plotted in Figure 4.2(b). It. is established in Exercise 4-1 
that the mean and variance of X are 


s\ 


fl + b , , (ft - «) 2 
, varO) = 




12 


(4.7) 



Figure 4.2, (a) Normal density with jU = 0.5 and a — 0.4 and areas within Iff and 
2ff of o . . (b) Uniform density on the interval [a h 5], 
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find 1 lie vi'b! I ii i ■ :-l i " | 1 : "■ ■ i '■ v ! : 1 1 .V Jt) ii nd Z -- N{ I . I ) is ■!-=; A 1 : :-l i ■ 1 in US: in- 

cise 4,6. When prior information is lacking, it. is often assumed that model param- 
eters have a uniform density. 


Definition 4*11 (Chi-Squared Distribution), Let A" ^ A : (G, lj be normally 
distributed. The random variable Y — X 2 then has a chi-squared distribution with 

1 degree of freedom , denoted by Y ^ x~ ( 1 ) . Furthermore, if Y], i = 1 k are 

in- U l:cii \'(i) 1 ' ■= i. ] l-: l> n l. v; i "i . 1 1 . j ■- , : : i- -i : ■ : n -iv :-th i . / 7T ■ ). j._ : V J rnniUan 
variable with A" degrees of freedom, denoted by Z -- ^(ft) or Z ^ xi ■ The pdf 


fzfok) 


^/ 2 T(k/2) 

0 


z > 0, 
z < 0, 



can be compactly expressed in terms of the gamma function, where 1 ( k / 2 ) 

odd &, and exhibits the behavior shown in Figure 4, 3 (a). The mean 
and variance of Z are 

E (Z) = k , var(Z) = 2k. 

Chi-squared distributions naturally arise when evaluating the sum of squares error 
between measured data and model values when estimating model parameters. 


Definition 4.12 (Student's t- Distribution)* Let A" -- ,Y(0.1) and Z — x J (fc) 
he independent random variables. The random variable 

T= 

sf'Zjk 

has a Student’s distribution (or simply (-distribution) with k degrees of freedom. 




(a) 0>) 

Figure 4, 3 + (a) Chi-squared density for k = 1 .5 and (b) Student’s t- density 

with k — 1, 2, 10 compared with the normal, density with // = 0. u 1, 
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The pdf fan be expressed as 


Mt\k) 


r((fc + 1 )/2) 
r(*/2)v / 5bF 



-{fc+O/a 

■ 


where V ae;ain denotes the gamma function, Note that 


MU 1 ) 


1 

7(TT¥) 


is a special rase of ihe Cauchy distribution. As illustrated in Figure 1.3(b), the 
density is symmetric and bell-shaped, like the normal density, but exhibits heavier 
tails. 


It will be shown in Section 7.2 that the ^distribution naturally arises when 
estimating the mean of a population when the sample size is relatively small and 
the population variance is unknown. 

Oil a historic note, aspects of this theory were developed by William Sealy 
Gossetj an employee of the Guinness brewery in Dublin, in an effort to select opti- 
mally yielding varieties of barley baser 1 on relatively small sample sizes. To improve 
perception folk.) wing the recent disclosure of confidential information by another em- 
ployee, Goaset was allowed to publish only under the pseudonym “Student.' The 
importance of his work was advocated by both Karl Pearson and R.A. Fisher. 


Definition 4,13 (Gamma Distribution)- The gamma distribution is a two- 
parameter family with two common parameterization^: (i) shape parameter n > 0 
and scale parameter A > 0 or (ii) shape parameter a and inverse scale or rate 
parameter 0 -- 1/A. We employ the second since the inverse-gamma distribution 

formulated in terms of a and 0 is a conjugate prior for likelihoods associated with 
normal distributions with known mean and unknown variance; see Example 4.G9. 
For A' — ■ Gamma(of, 0 ) , the density is 


fx(x] a, 0) 


0* 

bfoj 


xn-'e-fi* 


x > 0, 


and the expected value and variance are JE(A) — a/0 and var[A ) — t\f f", 

In MATLAB®, random values from a gamma distribution can be generated 
using the command gajnrnd.m, which uses the first parameterization based on die 
shape and scale parameters a and A. 

We point out that the one-parameter xf distribution with k degrees of freedom 
is a special case of the gamma distribution with 0 — ^ and 0 — :|. 


Definition 4,14, (Inverse- Gamma Distribution)* If A" has a gamma distribu- 
tion, then Y = X~ J has an inverse- gamma distribution with parameters that satisfy 

A' Gamma(a, 0) Y ™ lnv-gainina(a, 5). (4.9) 

Hence the density Y 

fv(y ; «, , 3 ) - 777 — y > 0, 
r(f>) 
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-I j 

and the moan and variance arc E (Y) = — ^ for n > 1 and var(iQ = 

for r> > 2. 

As noted in Definition 4,13 and illustrated in Example 4.69. the inverse- 
gamma distribution is the conjugate prior for normal likelihoods that are functions 
of the variance. The equivalence (4,9) can be used to generate random inverse- 
gamma values using the MAT LAB Statistics Toolbox command gajnrnd.m. Since 
x — gam rad (a. A) is parameterized in terms of the scale parameter, one would em- 
ploy the command y = gauirnd^i, d), with ft = I /A, to generate realisations of 
Y Inv-gaiufa. J), A technique to construct random realisations from the inverse- 
gamma distribution, if gaurnd * m is not available, is discussed at the end of this 
sectioi i . 


Definition 4.15 (Inverse Chi-Squared Distribution)* The inverse chi-squared 
distribution is a special case of Inv-gamriia(a, jS) with a = 8 = so the density 

is 


fy(y:k) 


cy-k/2 

- 1 / 2 y 

t(k/2) y 


for y > 0, Tliis reparameterization can facilitate manipulation of conjugate families 
wl i en co i isl i ■ net i ng f 3 ay esian ] icsteri or d i str i br n ioi is . 


Definition 4*16 (Beta Distribution), The random variable X Beta (a-, jS) has 
a beta distribution if it has the density 


fx (*; a, 3) 


± /?) ,-i 

r(a)n,3)‘ 


(1 - x) 


fi-l 


for x €= [0, 1]* As illustrated in Example 4.6R, it is the conjugate prior for the bino- 
mial likelihood. It is observed that if & = = jS = = 1, the beta distribution is simply the 
uniform distribution which is often used to provide nonin formative priors. Realiza- 
tions from the beta distribution can be generated using the MAT LAB command 
betarnd.m. 


Definition 4.17 (Quantile- Quantile (Q-Q) Plots)* A Q-Q plot is a graphi- 
cal method for comparing data from two distributions by plotting their quantiles 
against each other. We will typically use this to determine the degree to which data 
is Gaussian, but the technique can be used to compare any distributions. If distri- 
butions are linearly related, Q-Q plots will be approximately linear. In MATLAB t 
Q-Q plots can be generated using the command qqplot . m. 

To illustrate, we compare in Figure 4.4 realizations from A r (3 f 4) and W(0, 1) 
distributions with data from a A r ((hl) distribution. The linearity in the first case 
illustrates that the two are from the same family, whereas the quantiles differ sig- 
nificantly in the comparison between the uniform and normal data. 
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w (b) 


Figure 4.4, Q-Q plot for (a) ;V(3, 4) and (b) It (0, 1) data as compared with A r (Q, 1} 
data. 


Kernel Density Estimation 


When estimating parameter densities in Chapter 8. we will determine the fre- 
quency with which values occur at the n points X*. EVom this, we wish to compute 
density values fx(&) at arbitrary points x in the sample space, We consider non- 
parametric estimation procedures that do not preassume a parametric form for the 
density. 

In principle, this can be achieved from a histogram of the computed values, 
as illustrated in Figure 4.5(a). After dividing the sample space into a set of N bins, 
the density is approximated using the relation 

1 Number of x* in same bin as x 
^ * ~ N Width of bin ' 


Whereas this approach is simple to implement in one dimension, it lias the following 
disadvantages: the choice of bin locations and numbers can determine the structure 
of the density, and it is difficult to implement in multiple dimensions. 

Instead, one often employs kernel density estimation (kde) techniques in which 


densities are formulated in terms of known kernel functions, as shown in Fig 
ure 4.5(b). In one dimension, kernel density representations have the form 


n 


i{x) =-X K 


i= 1 



(4.10) 




Figure 1,5, (a) Histogram and approximating density, (b) Kernel basis function 
and kernel density avhamf 
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where K is a specified, symmetric pdf (e,g, ? normal) and ft- ts a smoothing pa- 
rameter termed the bandwidth |37, 221 . Representations in higher dimensions are 
analogous. 

Tf o] re 1ms access to the MATLAB Statistics Toolbox, the function ksdensity.m 
can be employed to construct kernel density estimates, Alternatively, the functions 
kde,m and kde2d.ni. which implement automatic bandwidth selection, are available 
from the MATLAB Central File Exchange, 

Inverse Transform Sampling 

111 Definition 1.14. we discussed the use of the function gaoimd.n:, from the 
MATLAB Statistics Toolbox, to construct random realizations from the inverse- 
gamma distribution. Here we summarize a technique to construct realisations of 
a general continuous random variable _Y with absolutely continuous distribution 
function F\- (V), 

For U -- l ■/ ( H n l) , we assume that we have a random number generator capable 
of generating realizations of V , We define the random var iable Y = F ~ 1 (. U ) which 
has 1 lie same distribution as X since 

Fy(y) = P(Y < y) 

= P{F? {U) < y) 

= P(V < F x (y)) 

= Fx (»). 

To generate a realization x of X, we generate a realization u of U and define 



One typically computes F' x 1 ( w } using numerical algorithms. Even for an arbitrarily 
fine mesh, the cost of this procedure is typically low. 

This technique can be used in lieu of calling gamrnd . m if the MATLAB Statis- 
tics Toolbox is unavailable. 

4.1.2 Multiple Random Variables 

For most applications, we have multiple parameters, responses, and measurements 
with each being represented by a random variable. We discuss here multiple random 
variables with associated distributions. 

Definition 4. IS (Random Vector). Let X \ . . . . . X fx lie random variables. The 
vector X : f! ■> M |V given by X = [X \ . X 2 X n ] is termed a random vector. 

Definition 4.19 (Joint CDF), for a random vector X . the associated joint cdf 
F_x : E" — * [0, 1] is defined by 

Fx(z 1 , «" * ■ j Tj t) = € Q|Xj(w) < Xj} , j = 1, . . . , n y 

which is often written as Fjr (#) = P{ A"i < sq, . , , , X u < x u } „ 


(Mi) 
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Consider now the random variables X \ : . . , , X Tl each having an expectation 
E(Jfi). It follows immediately that 


E 



Y aiE(Xi), 

\= i 


( 112 ) 


where a a n are real constants. Furthermore, if the rt random variables are 
independent, then 


E(X x X 2 ■ ■ ■ X n ) = E(Xi)E(X a ) ■ ■ -E(X n ). 


(4-13) 


Dcfi li i t i on 4 , 2 0 ( C o var i an ce an d C orrolat ion)* The co va ri ance o f rando m vari- 
ables X and Y is 1 he number 


cov(A\ Y) = E[(A - E (X))(Y - E(Y ))] = E (XY) - E(A)EjY), (4.14) 


and the correlation or Pearson correlation coefficient is 


PXY 


«>v(-W) 

<Tx(Ty 


(4.15) 


We note that if X and Y are independent, then cov(A, V ) /?xy : 0 and the 
random variables are uncorrelated. The converse is not true in general since the 
relation (4.15) quantifies only linear dependencies among random variables. 

Returning to the case of n random variables, it is shown in 96 that 


n 


n 


var ( y atXi ] - var(A*) + 2 ^^rtjCOv(A^Xj), 

i<i 


(4.16) 


1=1 


t = I 


n 


which simplifies to 

var | W aj A't j = y] af vav{A' ; .) 
V;=i / 4=:i 

i ; ‘ the random variables are pairwise uncorrelaled. 


(4.17) 


Theorem 4 + 2jL Let X\ A., be mutually independent t normally distributed, 

random variables with A^ ~ A r (/q ? cr^), and let and tq n be fixed 

constants. As proven in Corollary 4.G.2 of 62 , it then follows that 


rz 


TL 


ri 


Z = YfaXi 4- b-i) :V y X a if l i + hi) 1 y affff 


(4.18) 


i=l 


1=1 


i=l 


Like the univariate normal , the multivariate normal distribution plays a central 
role in many facets of uncertainty quantification. 
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Definition 4,22 (Multivariate Normal Distribution). The random n- vector 
X is said to be normally distributed with mean //. , 1 and covariance 

matrix 

var (X L ) cov(X, , A*) ■ ■ ■ r.ov(X 1 , X n ) 

cov( A' - 2 j X'i) va i-(A'o j - - c< > v ( X 2 , X n ) 


V= | 

co v (X n , X i ) co v ( A"j t , A 2 ) ■ ■ j var { X n ) 

designated A" ™ N V ). ii' the associated density is 


(4.19) 


fx M - 


1 


\f (2tt) /a I V 


=exp 




Here x = [xj , x n \ and | V is the determinant of V . 

We use the next theort in when constructing proposal functions for the Metropo- 
lis algorithms detailed in Chapter 8, 

Theorem 4,28, Let Y = . , Y rt ] 1 he a normally distributed random vector, 

Y — A" (/r , K), where V" is positive definite. Let Z X (0, / rt ), where I n is the n x n 
identity, Then Y (R.Z -\- f.i}- where V RR^ and R. is a tower triangular matrix. 

A proof of this theorem can be found in [96] . We note that the decomposition 

V - RR T can be efficiently computed using a Cholesky decomposition. 

Finally, the concepts of marginal and conditional distributions and densities 
will play an important vole in statistical inference We summarize the definitions 
for continuous random variables and refer the reader to 1112;, 1 71 1 for analogous 
definitions for discrete random variables. 

Definition 4*24 (Marginal PDF)* Lei Aj and A" 2 be jointly continuous random 
vi 1 ri al >h :;s w i tl 1 j 0 ini pdf fx ( a-i ■ &2 ) , Tht * in argi 1 1 al de 1 is i ty fui 1 ctio 1 is o f X 1 a 1 it. 1 X 2 
are respectively given by 

fx 1 (#1 ) = / fx it ^2) 2 > (^2 ) = / fx (*1 , *2 ) dn 1 - 

./■r Jr 


A representative marginal density is plotted in Figure 4.6(a). Similarly for jointly 

continuous random variables Ai, . . , , X n with joint density function Jx{& in 1 £t&)t 

the marginal pdf of A * is 

fx 1 fri) = / ■ ■ ■ / fx (Xi ^ 2 , 2 -hi )dj: 2 ■ ■ ■ dx n , 

Jk Jr 

Definition 4.2 G (Conditional PDF), Let A] and A 2 be jointly continuous ran- 
dom variables with joint pdf fx{&i 1 ^ 2 ) and marginal pdf /xiO k l) and fx 2 (Z2)^ The 
conditional density of A", given AA — is 


fx x \x 3 (^1 ^2) 


, fx-Jxi) > 0 , 


0 


, otherwise. 
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Figure 4*6* (a) Marginal density an< ^ (b) conditional density fx^ |y. ? ( x i 1^2 ) 

at x 2 = — k foi ' cl no rm a l joint d e n si ty fx ( % i * £ 2 ) wi th cow a ri a nee mat rix Y 0 . U 97. 


as plotted in Figure 4.6(b). We note that fxi\x 2 ( x i 1^2 ) ^ a function of xj. The 
definition for /a'-,|X : ) is analogous. Similarly, for n jointly continuous ran- 

dom variables X\ 7 . . ri X n with joint, density function fx\$ it * - - and margin n I 


density fx t (-Ti), the conditional pdf of Xo, . . . , X n given X\ = x\ is 

fx($ 


j" X •£ , . . . , -V n \X , i x 2* ■ ■ - ■ R ) = 


fxA x 1 ) 


Definition 4.2b (iid Random Variables)* Random variables X ]...,, X t , are 
said to be independent and identically distributed (lid) with pdf tj(A) if they are 
imitLsaily independent and the marginal pdf fx, (*,) for each Xi is the same function 
g(x) f x 1 (x\ ) ■ ^ ■ fx„ (ir n ), The joint pdf for iid random variables is 


fx(x 1 , ■ . ■ , = 11 fx, fo). 

3—1 



4.2 Estimators. Estimates, and Sampling 
Distributions 

111 this section, we summarize concepts pertaining to the estimation of unknown 
parameters through samples, observations., or measurements. I 11 Section 4.3, we will 
detail specific techniques to estimate parameters in the context of model calibration. 
M- i - ■ w 1 - n -rr. I 1 h.' | m ri ; . i . 1 i : i . ■ : - :) > ■ ;i .nil i;-: and I bo ::-kui 1 1 : 1 - r- -ijv b or- >vh I'd i.i 

Section 4.8 and Chapter* 7 arid 8, 

Definition 4*27 ( Point and Interval Estimates)* C onsider a fixed but unknown 
parameter q € Q C K' 1 . A point estimate is a vector in E JI that represents q. 
An interval estimate provides an interval that quantifies the plausible location of 
components of q. The mean, median, and mode of a sampling distribution are 
examples of point estimates, whereas confidence intervals are interval estimates. 
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Definition 4.28 (Estimator and Sampling Distribution). An estimniur is a 
mi e or procedure that specifies how to construct estimates for q based on random 

sain] 4es X\ X n , Hence the estimator is a random variable with an associated 

distribution, termed the sampling distribution, which quantifies attributes of the 
estimation process. The estimate v.s a realization of the. estimator, so it is a function 
of the realized values .C] . . . . ,x n . An estimator is said to be unbiased if its mean 


is equal to the value of the parameter being estimated. Otherwise it is said to be 
biased. Two estimators that we will employ for model calibration are ordinary least 
squares and maximum likelihood estimators. We will also employ mean, variance, 
and interval estimators tit various points in the discussion. 


Definition 4.29 (Statistic). A statistic is a measurable function of one or more 
random variables that does not depend on unknown parameters. 


Example 4,30, Let A' i . . . . , X n be random variables associated with a sample of 
size rir Suppose we wish to estimate the population mean // and variance tr 2 , which 
are assumed unknown. This can be accomplished using the estimators, or st atistics „ 




it 




(4,21) 


which are the sample mean and variance. We employ n — I rather than n in the 
expression for ,b’" to ensure that it is unbiased. If we additionally assume that 
A, Y(/j. iT “ ) . it is illustrated in [171 that the sampling distributions for X and 
are 

/ 2 \ 2 

f .-“I-- V 

S 1 x 2 (n- 1). 


s 2 


X ^ A I jt. 


n 


n- I 


(4.22) 


Definition 4.31 (Interval Estimator and Confidence Interval). The goal 
when constructing an interval estimate is to determine functions qi(:e) and qnix) 
that bound the location r/£,(tf) < q < (Ir{&) of q based on realizations x = [aci , . . . , x r F+ ] 
of a random sample X = A" i , . . . , X n \. The random interval |^/.(A’ j. 4f/?{X)] Is 
termed an interval estimator. An interval estimator in combination with a confi- 
dence coefficient is commonly called a confidence interval. The confidence coefficient 
can be interpreted as the frequency of times, in repeated sampling, that the interval 
will contain the target parameter q. The (1 — cr) x 100% confidence interval is the 
pair of statistics (r/£.( X ) , q r ( X j ) such that for all q F Q. 

P[qL (X) < q < q R (X}\ = 1 - a. (4.23) 

As detailed on pages 418-419 of [62] t it is important to note that the interval is the 
random quantity in ( 1.23), not the parameter. 

Example 4.32, Consider a sequence of n random variables AT.... ? A rt from a 
normal distribution with known variance and unknown mean p: that is. X, --- 
A r (p n it" ) . To determine information about the unknown mean, we consider the 
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sample mean X given by (4.21) which 1 i as the sampling distribution given in [l .22). 


It follows that A , j"' ^ JVYO, 1) sc that 
r. r /vn 1 f 

F (- 2 < ,~if < 2 ) w 0.9545 

V et/i/n ) 

since 95.4555 of the area of a normal distribution lies within two standard deviations 
of the mean. This implies that 


2a - 2d 

/■ f X — — <_ p <_ X f _ 

\ Jn Xn 


0.9545. 


Here [X — 2tr/ v /n, X I 2a/i/n\ is an interval estimator for p. where both endpoints 
are statistics since cr 2 is considered known. The 95,45% confidence interval is [x — 
2<r/y / n ? x \ 2a / \/n \ , where x = d- is the realized sample mean based on n 

measurements, or realizations,, xi of tlie random variables X*. 

Example 4<33 + We now turn to the problem of determining the confidence interval 
for the mean p of a normal distribution when the variance a 2 is also unknown. To 
eat imat.e a 2 1 we employ the stat istic S J given by ( 1.2 L ) which has the x“ distribution 
(4.22). We thus have 

X = v ' ff[A ~ />i - N( 0 , 1) , z = - — - X Q (n - 1) 

a a 1 


so that the quotient 

■J _ X v^(V - /0 

v'%'(« - 1 ) 

has a ^distribution with n — 1 degrees of freedom; see Definition 4.12. To determine 
a (1 - a) x 100% confidence interval for a given value of n y we seek values a and b 
such t hat 

p(u< ■ III < _ I — a . 

Ill Figure 4.3(b ) t it is shown that the ^-distribution is symmetric so that E> = - a, 
which we denote by i^-ij - a /2 to reflect the n— 1 degrees of freedom and probability 
1 — f>/2. It then follows that. 

n { - tn-l,l-n/2 S ^ v , *n-l,l-o/2 5f \ 
r X — < fi X 4- — = 1 — fi. 

\ V™ V n J 

One can employ standard tables of Jf-distiibntions to determine ln-\.\-a /*2 given a 
and n and thus specify the (1 — a) X 100% confidence interval [X — t n _ 1L &/%£/ t/w. . 
X + ^rt-Lji-a/2^/ ]■ We remind the reader that for a = 0,05, this is a random 
interval that, has a 95% chance of containing the unknown hut. fixed (deterministic) 
parameter p. The interval is constructed by obtaining measurements , x n and 

employing the realizations x = d Xi and s' 2 = yzy ST-i ” x )'~ to obtain 

l.\-a/2 s - . 1,1 a /'2 s 

X •= , X I — 

v n wn 
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We will use t -< lis feribu tions in [.his m ai mer h i Ch a| > ter 7 to co i is t r u< :t ot m f itlei :i ct '■ 
intervals for rim del parameters determined using least, squares estimators when o is 
unknown and the degrees of freedom are relatively small. 


4.3 Ordinary Least Squares and Maximum Likelihood 
Estimators 

The process of model calibration entails estimating model parameters, and pos- 
sibly initial and boundary conditions, based on measured data. More generally, 
the estimation of model parameters 3 based on observations, comprises a significant, 
component of statistical inference which is further discussed in Section 4.8. 

To motivate, consider the statistical model 


qo) + £, , i = l f . . . , n, (4.24) 

where T, are random variables whose realisations are a set of n measurements 
from an experiment and /f fr, q) is the parameter-dependent model response or Qol 
at corresponding times. The random variables c 4 - account for errors between the 
model and measurements. Finally, r/o denotes the true, but unknown, parameter 
value 2 that we cannot measure directly but instead must infer from realizations of 
the random variables T*. We emphasize that in fhis context, c/o is not a random 
variable. 

4.3.1 Ordinary Least Squares (OLS) Estimator 

Consider (4,24) with the assumption that errors £$ are iid + unbiased so that E(e*} 

0, and have true but unknown variance var {ty ) = <rj \ . We assume that the true 
parameter f/u is in an admissible parameter space Q, and we let Q denote the 
corresponding sample space, As illustrated in the examples of Chapter 7, these 
sp;u:cs typically coincide. 

The OLS estimator and estimate 3 


n 

<Iols ~ argmiii N^|T, - /(t*,?)] 2 , 

‘F Q T— 1 

rr 

qoLs = argu i in VN - f{k, 7)1" 

<i€Q i=1 

are the random variable and realization in R p that minimize the respective: sum of 
squares errors, a& illustrated in Figure 4.7(a). Details regarding the distribution of 

3 As detailed in Section 3,4, 4? is typically employed to represent coit&raiiim parameters in the 
statistics literature, whereas other conventions are common in the mathematics, engineering, and 

science literature* We urr t,hF! mathemtAt.i-CF; notation q due t.o- t,hfi flexibility that it. provides tor 
representing hnth physical and. -calibration parameters a_q waLI as their ndmiKsihl-e parameter spaces. 

^ The use of the notice ion $ 01*5 to indicate the estimator is. not universal, and many texts 
■denote the least squares estimate by the hat- notation. Hence care must be taken to establish the 
convention employed in the sped lie text. 


(4.25) 
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Figure 4.7. (a) Ordinary least squares solution q n to (4.25) and (Id) maximum 
likelihood estimate q.^LE given by (4.27). 

.. I ■ ■ i S( 1 . :n variiM;-- awjnpi V -n:-. ivgndhig i; b * ■ -iisi vil ,n\ i .. >i : i if 1 ho or: - ! ; i r-. ■ 

provided in Chapter 7- 

4.3.2 Maximum Likelihood Estimator 

Maximum likelihood estimators can also be used to achieve the objective of esti- 
mating a parameter */ based on random samples Ti , . . , n T n r 

Definition 4,34 (Likelihood Function), Let /ylt 1 ;^) be a parameter-dependent 
joint pdf associated with a random vector T = [Ti, . . . , T n J, where q € is an 
unknown parameter vector, and let v — [t?i , . . . . , v n j be a realization of T. The 
likelihood function L : Q -4 [0. oc) is defined by 

L i} (q ) = L (q \ v) = f T ( v ; q) t (4 . 26) 

where the observed sample v is fixed and q varies over all admissible parameter 
values. The notation L^(q) is somewhat nonstandard, but it highlights the fact 
that the independent variable is q. Some authors use the notation 

1(g) = L{q\d) - fr(d\q)> 

where d — [d^, . . „ , rf n | denotes the outcome from a random experiment, to reinforce 
this concept. 

We note that because L is fund ion of c/, it is not a pdf, and the notation 
/_. ( q 1 1 j j . while standard, should not be interpret id as h eonditiminl pdf. If T is 
discrete, then L i} (q} is the probability of obtaining the data v for a given parameter 
vnl .ic- ts. I " i - il n:m ms T. i In' IV. c l lliiu /. V- 1 ■ 1 . 1 1 ■. ] mih n ■ i iViin .i cou.-dmil if 
proportionality can be combined with Rieinann sum approximations of the integral 
to obtain a similar interpretation. 

For n iid random variables, it follows from (4.20) that the likelihood function 
is 

n 

L>(q\v) = h\ (*’;;</) 

T — 1 

Finally, we denote the log- likelihood function by 


4(<f) - %k) = In L[q\v). 
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Example 4.35. Consider the binomial distribution with probability "f success q , 
' [ 1 le pro 1ml 3 i 1 i ty mass fi. mot ion 

Mv: q, n) = P(T - u|n, q) = (") 4*(l “ <3 )" _w 

quantifies the probability of obtaining exactly u = 0> 1, . . . , n successes in a sequence 
of u experiments. In this function, q and o are known and v is unknown, Although 
the likelihood 

P(r/Krt)= 

has the same functional form, the independent variable now is q, and u and n are 
known. Honce the likelihood function is continuous, whereas the probability mass 
function is discrete. 


Estimates for j/q are commonly constructed by computing the value of q tliat 
maximises the likelihood which is termed a rnoxirnum- Likelihood estimate (MLE). 
For iid samples, the MLE is 


q.w l -£? 


argmaxl f fr, (vtk)- 

t = ! 


To illustrate, we consider (1,21) with the assumption that errors arc iid, 
unbiased, and normally distributed with true but unknown variance so that 
— ■ A r (0jCFQ) and hence T* — qn), ffjj). In this case, q and u 2 are both 


parameters and the likelihood function is 


n 


1 


— - — *-• ^ / 2rr 


| ^^^77 






(4.27) 


The MLE for qo and ar. is 

1.4) = argmax L{q. a‘ | t>), 

fieU 

a-- 


(4.23) 


where q^LR is depicted in Figure 4.7(b), 

Due to the monotonicity of the Logarithm function * maximizing L(i 7 ,*t"|-v) Ls 
equivalent to maximizing the log-likelihood 




1 

2*t j 


E - /(^> 4)] 2 ■ 

■t=i 


From a computational perspective, however, the log-likelihood is advantageous, so 
it is commonly employed in algorithms. For fixed a 2 , the condition <r 2 \v) — 0 
yields 

n 

^ N — */)l q) — b- 

■i— 1 


(4.29) 
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where V / denotes the gradient of / with respect to q, It is observed that with the 
assumption of fid> unbiased, normally distributed errors, the maximum likelihood 
solution q, xt ,,R to (4.29) is the same as the least squares estimate q GLe specified by 
(4.25). The equivalence between minimizing the sum of squares error and maxi- 
mizing the likelihood will be utilized when we construct proposal functions for the 
MCMC techniques in Chapter 8. 

In frequentisi inference* the MLE q MLB is the parameter value that makes 
the observed output most likely. It should not be interpreted as the most likely 
parameter value resulting from the data since this would require it to be a random 
variable which contradicts the tenets of frequentist analysis. 


4.4 Modes of Convergence and Limit Theorems 

There are several modes of convergence for sequences of random variables and dis- 
tributions that are important for our discussion. We summarize the definitions and 
refer the reader to 62 n 81, 82] for additional details, examples, and proofs of related 
theorems. 


Definition 4.36 (Convergence in Probability). A sequence Xi, X 21 . . - 

dom variables converges in probability to a random variable X, written as X 
if for every t > 0 S 


of ran- 
p 


x, 


lim Fi\X. u - X\ > £■) = 0 or, equivalently, 11m P(\X„ - X\ < e) = 1, 

71 — 1-00 

Note that Xh X r 2 i - - - are typically not ild in this and the following definitions. This 
mode of convergence is weaker than almost sure convergence. 


Definition 4,37 (Almost Sure Convergence). A sequence AY- AY „ . . of random 
variables converges almost, surely to a random variable A, written as AY AY A", if 
for every e > U, 

p( lim | AY - X\<A= l. 

Examples of sequences that, converge in probability but not almost, surely are pro- 
vided in [62]. This is sometimes referred to as convergence with probability 1. 


Definition 4 (Convergence in Distribution), Lei Xi,XY-,- be a sequence 
of random variables with corresponding distributions Fx z (j-)t -F!x a (#)* ■ - — IF Fx ( :I: ) 
is a distil ibution function and 


lim Fx^(x) - Fx(x) 

Jl— &-QQ 


at all points x where F_x (x) is continuous, then X n is said i o have a limiting random 

variable X with distribution function Fx (x). In this case, X„ is said to converge 

/> 

in distribution to X, which is often written as X n -> X . Care must be taken when 
using this notation since the convergence of random variables is defined in terms 
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of the convergence of the distributions, 
different from the previous two. 


Ilente this mode of convergence is quite 


We note that almost sure convergence implies convergence in probability, 
which in turn implies convergence in distribution. Hence convergence in distri- 
bution is the weakest of the three concepts. 


Definition 4+39 (Consistent Estimator), A sequence q n of estimators is said 
to be consistent, or weakly consistent, if it converges in probability to the value qo 
of the parameter being estimated. In practice, we often construct estimators that 
are a function of the sample size n. In this case, the estimator is consistent if the 
sequence converges in probability lo f/o as the number of samples tends to infinity. 


Law of Large Numbers and Central Limit Theorem 

The law of latye numbers and central limit theorem are two of the pillars of 
probability theory. To motivate them, we consider the problem of estimating the 
unknown mean p and variance <7 2 of a population based on samples xj n , . - - and 
associated random variables X \ , X-_> , . . An estimator for the mean is 



T — ] 


(4.30) 


so a natural question is the following: Does lim n _>oo X n > = This is addressed by 
the si vong and weak l aws o f 1 ar ge m u i ibers . 


Theorem 4.40 (Strong Law of Large Numbers). Let A;. ... be iid random 
variables with E(X$) = {i and var(X t ) = u' 2 < oo, and define X. }1 by (4.3U). Then 
for every £ > Ch 

P ( lim \X n - f i | <e) = l or X n ^ fli 

p 

The formulation of the weak law of large numbers is similar except X n — > fi. 
These laws arc of fundamental importance since they establish that the random 
sample adequately represents the population in the sense that converges to the 
mean ft. 

Given (he central role of the sample mean, it is natural to question the degree 
to which its sampling distribution can be established. In Example 4.80. we noted 
that if Xj -- A(p., fj J ) : then X ^ N(jl &-/n). The requirement of normally dis- 
tributed random variables is quite restrictive, however, so we relax this assumption 
and pose the same question in the context of iid random variables from an arbitrary 
distribution. The remarkable answer is provided by the central limit theorem. 


Th eorei n 4.41 (Central Limit T 1 leo re m )„ Let X p, . ., . , bo lit: randoi n vari- 
ables with E{XV) = and var(Xi) = u z < oo. Furthermore, let X n be given by 
(4-30), and let C7 n (;c) denote the cd I' of the random variable y'TIf X ( . — ft . ) / a . Then 


lim 0^ 

n—t co 


f -Xe-rihly 

/= ^ V' 2 7T 


DC 
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no that the limiting distribution ol \/v(X Tl — ft)f& is a normal distribution A r (0, 1), 
The theorem is often expressed as 



<T 


D 


i 


where 7 ■— ,\ (0, 1 ). 
Because 


A"n —t X 


N 



(4-31) 


X n is approximately normal for sufficiently large rt. This result- is similar to that 
noted in Example 4.30 for Xi ~ A r (/^ & 2 ) but with the major difference that (4.31 ) 
holds in an asymptotic sense for A'; from an arbitrary distribution as long as n is 
sufficiently large. 

From a broad perspective, the combination of the law of large numbers and 


central limit, theorem establishes that for sufficiently large n.. samples are represen- 
tative of the population (in the sense of the means) and the means of these samples 
behave asymptotically as normal distributions. The question as to how large n must 
be to ensure this asymptotic behavior is problem-dependent 1 and the assumption 
of approximate normality can be questionable when sample sizes are small. 

We will invoke the asymptotic normality provided by the central limit theorem 
in Chapter 7 when constructing sampling distributions for model parameters* 


4.5 Random Processes 


111 Section 4.1, we summarized the framework associated with random variables 
and random vectors. However, in icon drily quantification in the context of differ- 
ential equation models can yield variables that exhibit time or space dependence in 
addition to randomness. This necessitates the discussion of stochastic or random 
processes arid fields:. We will also see that the Markov chain Monte Carlo (JV1CMC) 
techniques of Chapter 8 rely oil the theory of stochastic processes. 

To motivate our discussion of random processes, consider first the ODE 


— = -a(uf)u , / > 0 , 

«(0, w) = 


(4.32) 


where a and 3 are random variables and w 6 f! is an event in an underlying 
probability space. It was noted in Example 3,1 that for every time instance the 
random solution u(t,x) is an example of a stochastic or random process. 

Now consider the PDE 


6T 


0 { . OT 

a(x, or) 


Ot Ox \ Ox 

T(f, -I } = T t , T{t 1) = T r , t> 0, 
T(0 t x) = To(x) , — 1 < x < 1, 


= /(bx) , - I <x < 1 , I > 0, 


(4,33) 
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which, as detailed in Example 3,5, models the flow of heat ;; in a structure having 
uncertain diffusivity Here a is an example of a random field and the solution 
T(t „ x, cjlt) is random for all pairs (£, x) of independent variables. 

Definition 4.42 (Stochastic Process). A stochastic or random process is an 
indexed collection 

X={X h t€T\ ={X(t),t€ T} 

of random variables, all of which are defined on the same probability space (f!, 3 r , P). 
The index avi is typically assumed to be totally ordered and often is taken to be 
time. Taking T to be a subset of consecutive integers yields a discrete random 
process, vy here as taking T to be an interval of real numbers yields a continuous 
process. 

The random solution ■? j. f £ . oj 4 ) to (4.32) is an example of a continuous random 
process. In the next section, we will devote significant discussion to Markov chains, 
which are discrete random processes, since they are central lo the Metropolis meth- 
ods used in the Bayesian analysis of Chapter 8 to quantify parameter densities. 

Other ordered index sets can be considered, including spatial points or inter- 
vals. However, the ordering in dimensions greater than one is complicated* so we 
employ the terminology stochastic or random fields for spatially varying quantities. 
A stochastic process can be interpreted three ways. 

(i) X is a function on T x with the realization Xt{w) for £ € T and u € ft. 

(ii) For fixed f £ T, X : is a random variable. 

{nij For an outcome lj £ ft. the realization A"j{uj) 5s a function of t. that is often 
called the sample path or trajectory associated with w. 

We note that continuous stochastic processes are infinite-dimensional and ex- 
treme cave must be taken when extending finite- dimensional convergence results to 
these cases. The following class of random processes is important since the concepts 
of meaiij covariance, and correlation functions are well defined for these processes. 

Definition 4,43 (Second- Order Stochastic Process). A second-order stochas- 
tic process is one for which Z(A f 2 } < oo for all t £ T. 

For second-order random pi o cesses, the random variable concepts of mean and 
covariance can be directly extended using the interpretation (ii). Specifically t the 
expectation and covariance functions of A are defined as 

//.(£) = E(A t ) 1 feT t 

C(t, s) = co v{X t , AV) = E [(X t - X g - p(s))] , t,s £ T. 

Hence p(£) quantifies the centrality of sample paths, whereas C(t t #) quantifies their 
vari al >il i ty about f.i ( t ) . 


(1.34) 
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Delimit ion 4.44 (Gaussian Process). A Gaussian process (GP) is a continuous- 
time stochastic process A' such that all firb tridimensional vectors Ah A",. , , . . . t 
Xt„ ] have a multivariate normal distribution; that is, 

where t= [t] t n ] t pt(i) = [E(Af tl ) t E{X in )], and [C(i)]y = cov(X t[ f X t} ) for 

all 1 < i , j < n, A GP is thus a probability distribution for a function. 

The concept of s rationality is important in the theory of Markov chains since 
it provides criteria specifying when MCMC methods can he expected to converge to 
posterior distributions for parameters. We consider this in the context of a discrete 
index set T but note that a similar definition holds for continuous index sets. 

Definition 4.45 (Stationary Random Process). The random process X is said 
to be stationary if, for any t\ , £ 2 , . . . > t Ti e T and s such that t\ -4-$, . . , t n -\-3 € T, the 
random vectors [X f , , . . . X ti and |.Y r . _ . s: . X tfi -#\ have the same distribution. 

For a stationary process, p(f ) is constant for all t = t n ] and C(t, s) = C{t—s) 

is a function only of the time difference \t — s|. 

Definition 44b (Autoregressive (AR) Models). An AR(l) process, or time 
series. A" satisfies 

Xt = PiXt-i + c t , c t - JV(0, t 2 ), (4,35) 

where pi is a parameter, If | pj \ < 1, the process is said to be wide-sense station- 
ary. In this case, E(X t ) = E(X t _i) SO that. E(X f ) = 0 and var(X t ) = E(JCj?) 

/jf E(A7_ j ) + cr- so that varfAp) = | We note that an AR(l ) process smooths 
\ he output in the sense of a low-pass filter. 

An AR(p) process satisfies 

p 

Xt = pkXt-k T , &t ^ A T (0, & 2 ). (4.36) 

k= l 

We note that AR(p) processes are a type of GP. 

Definition 4.47 (Random Field). The concept of a random field generalizes 
that of a random process by allowing indices that are vector- valued or points on a 
manifold. Specifically, a random field is a collection 

X = {X Tf x € X} 

of random variables indexed by elements x in a topological space A. For our ap- 
plications, we will employ random fields to quantify uncertain spatially varying 
parameters such as a(:i:,w) in ( I.. 33), 


For the definitions of random processes and random fie lds, we have considered 
indexed families of random variables which, for fixed values of the index, map 11 
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to BL When describing Maikov processes, however, it is advantageous to generalize 
this concept to include random variables that, map into a state space S, This is 
established in the following definitions. 

Definition 4.48 (S- Valued Random Variable). Let S be a finite or countable 
set termed the state space. An S- valued random variable is a function A" : — ► S 
such that {w £ -T!|A(w) < x] £ J 7 for each x £ S if there is an ordering on S- Note 
that this is exactly Definition 4.3 if S = R. 

Definition 4.49. A random process A" ih said to have a stale tspace S if A"r is an 
5- valued random variable for each tel. 


4.6 Markov Chains 

In Chapter 8, we will employ Markov chain Monte Carlo (MCMC) methods to 
construct posterior densities for model parameters. We summarize here the funda- 
mental properties of Markov chains necessary for that development. 

Broadly stated, a stochastic process is said to satisfy the Markov property 
if the probability of future states is dependent, only on the present state rather 
than the sequence of past events that precede it. This is completely analogous 
to the state space concept of modeling in which a system is defined in terms of 
state variables that uniquely define the behavior ai time f. When combined with 
dynamics encompassed in the model, the future state behavior can be completely 
defined. Both Markov processes and state space models are. 1 mernoryless in the 
sense that the past history is not required to make future predictions. Whereas 
Markov processes can be defined for both continuous and discrete index sets T. we 
focus solely on the latter since it provides the setting necessary for MCMC analysis. 
Discrete-time Markov processes are usually called Markov chains t although some 
authors also us^- this designation for continuous-time processes. 

Definition 4,50 (Markov Chain), A Markov chain is a sequence of S- valued 
random vari able* 

X = 

that satisfy the Markov property that AVi-i depends only on X n .; that is, 

-^(An+i = iT'n-f- I |Afl — 3? ft f ■ ■ ■ ; ~ = I = ^>1+1 |-^ti = { L37 ) 

where (v t is the state of the chain ai time i. 

A Maikov chain is characterized by three components: a state space S, an 
initial distribution p 1 '. and a transition or Markov kernel. As indicated in Defi- 
nition 4.48, the state space is the range of all random variables, so it is the set 
of all possible realizations. We assume a finite number k of discrete states, so 

S = (iel, The initial distribution quantifies the starting configuration for 

the chain, whereas the transition kernel quantifies the probability of transitioning 
from state to Xj, so it establishes how the chain evolves. For our discussion, we 
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assume that the transition probabilities are the same lor all time, which yields a 

homoif t: ft SOUS ^ [ at‘ ko V ch ai n . 

We let: pij denote the probability of moving from x± to x ;} - in one step so that 

pij = P(Xji\ 1 = Xj | A jr 4 = Xi ) - 


The resulting transition matrix is 


P — \pij] , 1 < Lj < A:, 


We will also be interested in the probability of transitioning between states in m- 
steps, which we denote by 

?i m) = P(X n+m = x t \X n = Xi) 

with the corresponding m-step transition matrix 



j 


The initial density, which is often termed mass when it is discrete, is given by 



whe to pj = F(X o = Xi ) . 1 Jec i u i so // 1 and P contain pr ot > abil i ties , tl u hr ei it r i t ss an : 
uonnegative and the elements of p u and rows of P must sum to unity. Matrices 
satisfying the property are termed row -stochastic matrices, 

Given an initial distribution and transition kernel, the distribution after 1 step 
is p 1 — p y> P and 

p n = p n - x P = p°P n 

after n steps. We illustrate these concepts in the next example. 


Example 4 .51. Various studies have indicated that factors such as weather, in- 
juries. and unquantifiable concepts such as hitting streaks lend a random nature to 
bssul/tt 11 [7] . We assume that a team that won its previous game has a 70% chance 
of winning their next game and 30% chance of losing, w r he re as a losing team wins 
40% and loses 00% of their next games. Hence the probability of winning or losing 
the next game is conditioned on a team’s last performance. 

This yields the tw r o-state Markov chain illustrated in Figure 4.8, where 

S = {win. lose] . 

The resulting transit ion matrix is 

_ T 0.7 U.3 " 

0,-1 0.0 ' 

There are a large number of teams in major league basel -all, so 

P° = \pl , p° t ] - Pi +Pe = l 
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Figure 4,8, Markov chum qurntMfpmq the probability oj winning or losing bused on 
the last performance. 


is the percentage of teams who won and lost their last games. To illustrate, we take 
p u — [0.8, 0.2], We assume a schedule in which learns play at different times, so p\\, 
and p; do not both have to be ().Fi. 

Tlie percentage ■ of teams who win/lose their next game is given hy 



X\.S , 0.2 


0.7 0.3 
0.4 0,6 


[0.64 , 0.36], 


so the distribution after n games is 


p n = [0.8 , 0.2] 


0.7 0.3 " 

0.4 0.6 


The distributions for n 0 . .,.,10 are compiled in Table 4.1. These numerical 
results indicate that the distribution is limiting to a stationary value. 

For this example, we can explicitly compute a limiting distribution ft by solving 
the constrained relat ion 


t r 


= 7 TP , TVi = 1 




0.7 

0.4 


0.3 

0.6 


[^"utin: ? ' ^toat ® 


to obtain 

7T = [0.5714 , 0,4286]. 

In general, however, w T e cannot, solve explicitly for a stationary value and instead 
must establish the maimer in which p' 1 limits io \Yc next discuss the nature of 
this convergence and summarise criteria that guarantee the existence of a unique 
limiting value. 


n 

p fl 

n 

p n 

n 

? n 

G 

[0.8000, 0.2000] 

4 

[0,5733 , 0.4207] 

8 

[0.5714 : 0.4280| 

i 

[0,0400 . 0,3000] 

5 

[0.5720. 0.4280] 

9 

[0.5714 : 0.4286] 

2 

[0.5920. 0.4080J 

6 

[0.5716 . 0.4284, 

10 

|0.5714 . 0.4286] 

3 

[0.5776, 0.4224] 

rr 

t 

(0,5715, 0,4285 




Til ile tl. If. notion and (hstributiems for Example. 4-51 
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As detailed in Section 4,4, it does not make sense to directly consider limits 
of random vari ables. Instead , we consider the limit 

lim p Tt = 7 r, 

Tt- 1-00 

which is convergence in distribution. We note that, if this limit exists, it. must satisfy 
7 T= lim p n P n = lim p°p n+ ' = ( lim p°F*) P = tP. 

pi — ^ oo n— loo \n—t oc= / 

Definition 1,52 (Stationary Distribution), Fora Markov chain with transition 
kernel /\ distributions tt that satisfy 


TT — TT / 


(4.38) 


are termed equilibrium or stationary distributions of the chain. In a measure theo- 
retic framework, ~ is an invariant measure. 

For every finite Markov chain, there exists at least one stationary distribution. 
However, it may not be unique and it may not be equal to lim^-^ p n . Criteria 
necessary to establish a unique limiting distribution tt = lim, woo p ,M are motivated 
bv the following definitions and examples. 


Definition 4-53 (Irreducible Markov Chain), A Markov chain is irreducible if 
any state Xj can be reached from any other state in a finite number of steps; that. 

is, p;"' > 0 for all states in finite m. Otherwise it is reducible. 


Example 4.54. Consider the Markov chain depicted in 
l ransit ion matrix 


P = 


0 

i 

3 

0 

0 


I 1 

3 it 

o n 

o i 

0 0 


0 

0 

1 


Figure 4.9(a) with the 


The chain is dearly reducible since - - 0 for j - L 2, 4. Furthermore, it is easy 
to verify that t = [0,0, 1,0] and tt = [0,0,0, L] are both stationary distributions. 
The property of irredudbility is required to guarantee that w is unique. 


1 1 





1/3 


(a) 



(b) 


Figure 4,0. (a) Reducible chain for Example 4,54 and fb) periodic chain for Ex- 
ample 4-55. 
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Defi nit- io ii 4 +55 ( l-V ri o d i c M a rkov C hai n ) + A M ai ko v d i ai n is \ >er io d i c i f p arl -s 
of the state space are visited at regular intervals. The period A: i.s defined as 

= gcd |;n|7r; , irr ' > 0 j 
= gcd {m|F(X n _|_ m = = xt) > 0} , 

The chain is aperiodic if A: I . 

Example 4.56. The Markov chain <tej >icted in Figure 4.9(b) with the transition 
matrix 

r o j ooo" 

U 0 1 U 0 

P= i 0 0 i 0 

0 0 0 0 1 

o a i n o 

has the unique stationary distribution w = (1/6, 1 / 6 , 1/3, 1 / 6 . 1/6]. Ii is estab- 
lished in Exercise 4.8 that if p l} — [1, 0, 0, 0, 9, 0] t then p A — p h — p' — - ■ - — p l \ so 
the period is k = 3. Because mass cycles through the chain at a regular interval, ii 
does not coi r verge , so llm n . >00 p~' do os no I ox is 1 . Fn r then nor e t it is t lemo 1 1 s t r at cd 
in Exercise 4.9 that if the limit, of a periodic chain exists for one initial distribu- 
tion, other distributions can yield different limits. Hence aperio dicity is required to 
guarantee that the limb exists. 

For infinite chains, one must additionally include conditions- regarding the 
persistence or recurrence of states. However, wo will focus on finite Markov chains 
for which it can be shown that if the chain is irreducible, all states are positive 
persistent [121] „ 

Before providing a theorem that establishes the convergence lim n — p 71 ■ - 7t, 
we summarize relevant results from matrix theory. 

Definition 4.57. A A x k matrix A is 

(i) non negative, denoted by A > 0. if o+j > 0 for all i,j, and 

(ii) strictly positive, denoted by A > 0, if a j , > 0 for all /, /, 

Theorem 4.58 (Perron Frobenius). Let A be a k X k nonnegative matrix such 
that A m > 0 for some m > M . Then 

(i) A has a positive eigenvalue Ay with corresponding left eigenvector xa where the 

entries of are positive; 

(ii) if A / Ai> is any other eigenvalue of A, then |A| < Ay; 

(iii) Ao lias geometric and algebraic multiplicity 1. 

There are several statements of the Perron Ftobenius theorem, and details and 
proofs can be found in 121 „ 130. 220]. 
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Theorem 4.59. For all finite stochastic matrices F, the largest- eigenvalue is Xq - - 1 . 


See [121 for a proof of this theorem. 

Theorem 4.60. Let F be a finite transition matrix for an irreducible aperiodic 
Markov chain. Then there exists M > I such that i Jjr ‘ > 0 for all m > M . 


Further details are provided in [121 | n and the theorem is illustrated in Exercise 4.10. 
The following theorem establishes the convergence of the Markov chain. 


Theorem 4.61. Every finite, homogeneous Markov chain that is irreducible and 
aperiodic, with transition matrix P, has a unique stationary distribution tt. More- 
over, chains converge in the sense of distributions, lim ft _*«> p n = = tt, for every initial 
distribution p'\ 

Proof. It follows from Theorems 4.58, 4.59, and 4.60 that the largest eigenvalue 
of P is Aq = 1. which has multiplicity 1. There is thus a unique left eigenvector tt 
that satisfies ttF = t and V] tt* = 1 , To establish the convergence, we first consider 
the- ei '■ i;ei l< loco i l l p< j$i 1 i o i i 


UPV = A = 


I 0 

0 a 2 


0 


o A* J 

where 1 > A 2 | > ■ ■ ■ > |A*| and V = U 1 . It follows that 


Ism P n - lira V 

Tl >DO Ti >OC 


■ 1 u 

0 

o - 

u = V 

■ 1 

0 

0 

0 

a ■ 

. D 

1 

£ -ii 


_ 0 

■ * ■ 

G _ 


Furthermore, we observe that UP — AU implies that 


?T| 


7Ta 


L - J u kk J L 






?T| 


f*k 


L ittii — u. kt J 


and V — U' implies that 


■■■I ■ ■ ■ h k 


1 - - - Wlfc 


" 1 ... 

0 " 

_ Ufcl --- 'V-hk \ 


_ 1 - - - ^kk _ 


_ n 

l J 


UV = 
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since = = 1. This establishes that the first column of V is all ones, Finally 


liin p” = lint p'P n 

Tt^OC 3 L — > CO 


1 - T 0 1>1 

li;n /j 1: . ... ■ />* 


3 L - >CO 


r j 


L i 


-[rf ■- rf] 

[TTl fffc] 

— ", 




Vfc* - 


A, 


l'A'1 


Ujfc* J 


0 


Ajt . 


TTl 


7T* 


n 


TTl 


L Wjti - ttjtfc J 




■Wjfcl 


WjbJt J 


thi is cat ah lis 1 ling 1 1 ic roc i fired coi i vo rgence , 


Theorem 4.61 establishes that finite Markov chains which are irreducible and 
aperiodic will converge to a stationary distribution tt. However, it is often difficult 
or impossible to solve for tt using the relations tt P tt subject to T] tt* 1, 
The detailed balance condition provides an alternative that is straightforward to 
implement in MCMC methods where the goal is to construct Markov chains whose 
stationary distribution -tt is the posterior distribution for parameters. 

Definition 4.fi2 (Detailed Ralnnoe). A chain with transition matrix P ■ \pij] 
and distribution tt — [?T| , ir*j is reversible if the detailed balance condition 

TTiPij — KjPji ( 4 . 39 ) 


is satisfied for all i,j. Since 


5^ ~~ "RjPji 



TT 


J ■ 


it follows immediately that ttP = tt so that reversibility implies stationarity. Hence 
if the chains are irreducible and aperiodic, they will uniquely limit to this specified 
stationary distribution. In Chapter 8, we use the Metropolis algorithm to construct 
chains that satisfy (4.39) and converge to the posterior density. 


4.7 Random versus Stochastic Differential Equations 

We briefly illustrate here the difference between random differential equations, which 
we consider throughout this text . and stochastic differential equations. This is done 
in part to allay a growing trend in the uncertainty quantification community to 
treat these terms as synonymous when in fact they are distinctly different and they 
require completely different techniques for analysis and approximation. 
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Definition 4,63 (Random Differential Equation). Random differential equa- 
tions are those in which random effects are i n an i tested in parameters, initial or 
boundary conditions, or forcing conditions that are regular (e.g., continuous) with 
respect to time and space. An example is the ODE 

O (kj) z -I - b ( t j 
: ± 0 ( '^ ) ? 



which lias t.he solution 


a(ij = e 






(w) I 


/' 

J 0 




b(ss 


We emphasize that b(£, w) is a random process, as defined in Definition ■1,1 2. with 
the additional requirement that for an outcome w E S"i, the sample path 6{t, of) is 
taken to be smooth, e.g,, in C[th£f , This guarantees that sample paths of the 
solution z(l,oj) are at least differentiable functions, as illustrated in Figure 1 .10. 

In summary,* for each realization of random differential equal ions are an- 
alyzed and solved sample path by sample path using the theory of standard dif- 
ferential equations [91 f Id 8. 231]. The goal pursued in Chapters 9 and IQ is to 
determine distributions or uncertainty bounds for s(£ T oj) based on those of inputs 
such as parameters or initial and boundary conditions. 


Definition 4,64 (Stochastic Differential Equation),. Tine role of uncertainty is 
fundamentally different in stocb;isiir different isl equations (SDEs), In this case, the 
differential equations are forced by an irregular process such as a Wiener process or 


Brownian motion, SDEs arc typically written symbolically ill terms of stochastic 
differentials, hut they are interpreted as [to or Stvatonovich stochastic integrals. 


For example > fluctuations in Z(t) due to a Wiener process W could be formulated 




(a) fb) 

Figure 4.10. Realizations of (a) a random differential equation and (b) sample 
paths of on SHE- 
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as 

dZit) = -aZ(t)dt + bd\V(t), 


which is interpreted a* 


Z{i ) = Zn 



aZ($)d$ ~F 



bdW(s) t . 


where the second integral is an Ito stochastic integral. 

As illustrated in Figure 4T0, the solutions of SDIts exhibit nondifferentiable 
sample paths due to the irregularity of the driving Wiener process. We do not 
further consider SDEs in this text but rather include this definition to delineate 
them from random differential equations. The reader is referred to [91, 138 for 
further details about SDRs. 


4.8 Statistical Inference 

The goal in statistical inference is to deduce the structure of, or make conclusions 
about, a phenomenon based on observed data. This often involves the determination 
of an unknown distribution based on observed data in which case the problem of 
statistical inference can be stated as follows. Given a set 

s = {*1 - Xj € R‘ v , 

of observed realizations of a random variable X , we want to infer the underlying 
probability distribution that produces the data S. 

Statistical inference can be roughly categorized as being parametric or non- 
pammetric in nature. In parametric approaches, one assumes that the underlying 
distributions can be adequately described in terms of a parametric relation having 
a relatively small number of parameters, e.g., mean and variance. The inference 
problem is to estimate those parameter* or the distribution of those parameters. 
This approach has the advantage of a typically small number of parameters but 
the disadvantage of limited accuracy if the assumed functional relation is incor- 
rect. In nonpar ametric approaches, one does not presuppose a functional form but 
instead describes or constructs the distribution based solely on properties of the 
observations. This avoids errors associated with incorrect parametric relations but 
require* that some structure be imposed on algorithms io ensure that reasonable 
dist ri but ions are determined. 

4.8.1 Frequentist versus Bayesian Inference 

Frequently and Bayesian inference differ in the underlying assumptions made re- 
garding the nature of probabilities, models, parameters* and confidence intervals. 
As detailed in 30 * each approach, nr a hybrid combination of the i wo, is advan- 
tageous for certain problems or applications, lienee it is necessary that scientists 
understand both. 
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From a frequeritist perspective, probability arc defined nw the frequencies with 
which an event occurs if the experiment is repeated a large number of times. Hence 
they are objective and are not updated as data is acquired. Parameters are consid- 
ered to be unknown but fixed; hence they are deterministic. To statistically establish 


confidence in the estimation process, one constructs estimators, such as OLS estima- 
tors or maximum likelihood estimators, to estimate the parameters in the manner 
detailed in Section 4.3. Based on either the assumption of normality for the errors or 
asymptotic theory resulting from the central limit theorem, one can then construct 
sampling distributions and confidence intervals for the parameter estimators. 


The interpretation of confidence intervals in the framework of frequent 1st infer- 
ence is often a source of confusion. As detailed in Definition 4*31, a 90% confidence 
interval has the following interpretation: in repeated procedures, 90% of realized 
intervals would include the true parameter q y, In model calibration, this means that 
if the estimation procedure is repeated a large number of times using data having 


the same error statistics, and a 9D% interval estimate is computed each time, then 
90% of the intervals would include go, as illustrated in Figure 1.11(a). The sampling 
distribution and confidence intervals thus quantify the accuracy and variability of 
the estimation procedure rather than providing a density for the parameter. Hence 


they do not provide a direct measure of parameter uncertainty. 

Because parameters are fixed, but unknown, values in this framework, it 
cannot be directly applied to obtain parameter densities that can be propagated 
through models to quantify model uncertainty. In some problems* the sampling dis- 
tributions may be similar to parameter distributions, but this needs to be verified 
either experimentally or using Bayesian analysis. This is discussed in more detail 


in Chapter 7, 

Probabilities are treated as possibly subjective in the Bayesian framework, 
and they can be updated to reflect new information. Moreover, they are considered 
to be a distribution rather than a single frequency value. Similarly, parameters are 
considered to be random variables with associated densities and the solution of the 


parameter estimation problem is the posterior probability density. The Bayesian 
perspective is thus natural for model uncertainty quantification since it provides 
densities that can be propagated through models. The interpretation of interval 
estimates, termed credible intervals, is also natural in the Bayesian framework. 



Figure 4.1 1 * In l b j * preta t io ti of a ( a) fitqu b ntist 9 0 % co nfidcHCB i nt erva l and (1 > ) 
Bayesian 90% credible interval 
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Definition 4.65 (Credible Interval)* The (1 — o) x 100% credible interval is thal 
which has a (1 — n) x 100% chance of containing the expected parameter. A 90% 
credible interval is illustrated in Figure 4. 11(b). 

We next provide details regarding Bayesian inference to provide the back- 
ground necessary for Chapter 8. 


4-8.2 Bayesian Inference 


Bayesian inference is based on the supposition that probabilities^ and more generally 
our state of knowledge regarding an observed phenomenon, can be updated as 
addition a] information is obtained. In the context of parametric models, parameters 
are treated as random variables having associated < tensities, 

Bayes' formula 




P(B|A)P(yl) 

Hm 


for probabilities provides a natural genesis for Bayesian inference. In the context 
of parameters Q = \Q j , , . . , Q [|1 J 1 that are quantified based on observations Li- 
ft 1 ! , . . . , u. n ] , one employs the relation 




Tr(ti|g)7ro{g) 


(4.40) 


where 7ro(<j) and 7 r(</|(/) respectively denote the prior and posterior densities, 
is a likelihood, and the marginal density 7r-y{d;J is a normalisation factor. Here 
q = <?(w) denotes realizations of Q. The subscripts that indicate specific random 
variables are typically dropped from the prior and posterior in Bayesian analysis, 
T1 le prior density quantifies any prior knowledge that may be known 

about t he parameter before data is taken into account. For example, one might 
have prior information based on similar previous models, data that is similar to 
previous data, or initial parameter densities that have been determined through 
oilier means, such as related experiments. 


It is common in model calibration, however, that one does not have such prior 
information, so one uses instead what is termed a noninformative prior. A common 
choice of noninformative prior is the uniform density, or unnormalized uniform f 
posed on the parameter support. For example, one might employ 


TToto) = JqiVoo) (|/) 

for a positive parameter. Ibis choice is improper in the sense that the integral of 
TVrj { >:?) is unbounded. It is recommended that a noninformative prior be used unless 
good previous information is known since it is shown in Example 4. CO that- incorrect 
prior information can degrade (4.40) far more than a noninformative prior. 

^ TijfiarifirR are referred t.n Sect-ion 3.4 for dLsnifiRion regarding notation convert ions, for parame- 
ters. in die mathematics, statistics, engineering, and science literature.. For example, 0 is typically 
used :o denote calibration parameters, in statistics whereas */ is commonly employed in the math- 
ematic* literature. 
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Iii "empirical Bayes 11 inference,, one also encounters data-dependent priors in 
which priors estimated using frequentist techniques such as maximum likelihood are 
employed in the Bayesian model. It is argued in [35] that this double use of data 
is problematic with small sample sizes and is at odds with the tenets of Bayesian 
analysis, 

Tlie term 7r(u|g), which is a fund ion of q with v fixed, quantifies the likelihood 
L (//It 1 ) of observing: v given parameter realizations r/ as details 1 in Section 4,3.2. 
We will illustrate various choices for t.he likelihood function in the examples at the 
end of this section and at the beginning of Chapter 8. The joint density is given by 

7r{?>0 = 7T(u|q)7To(</) 

and is normalised to unity by the marginal density function vr-y(-u) of all possible 
observations. 

Finally, the posterior density 7r(g|u) quantifies the probability of obtaining 
parameters q given observations v. It is the posterior density that we will be 
estimating using t.he Bayesian parameter estimation techniques of Chapter 8, and 
we point out that the data directly informs the posterior only through the likelihood. 
Finally, representation of tt-y(u) as the integral over all possible joint densities yields 
the Bayes relation 

-(s;|r/j57o(f?) 


n<? d) = 


/ Rr . TMflKoMdg 


(4.4 l ) 


commonly employed for model calibration and data assimilation. 

A significant issue, which will be discussed in detail in Chapter 8, concerns 
the evaluation of the normalizing integral, [i on n be analytically evaluated only in 
special cases, and classical tensored quadrature techniques are effective only in low 
dimensions; e.g „ p < 4, This has spawned significant research on high-dimensional 
quadrature techniques T including adaptive sparse grids for moderate dimensionality 
and Monte Carlo techniques for high dimensions; see Chapter 1 1. 


Example 4.66. To illustrate (4.41) in a setting where the posterior density can be 
computed explicitly, we consider the results from tossing a possibly biased coin. 


The random variable 


T i(w) 


Cl , w = T, 

I , it: = II. 


represents the result from the i** 1 toss, and the parameter <} is the probability of 
getting heads. We now consider nhe probability of obtaining A', heads and tails 
in a scries of .V = An ■ Ni flips of the coin. 

Because coin flips are independent events with only two possible outcomes, 
the likelihood of observing a sequence v = tq . . . . , u ] , given the probability q , is 


N 

= If qVi ( x - Vi 

i = i 

= Vi ( 1 - ? )W-E> 
= q N '(l-qf°, 
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which is simply a wealed binomial density. Wo consider first- a non in formative prior 


from 


1 , 0 < tf < 1 , 

0 , 


which yields the posterior density 

f| , 


(:V + I )! _ jVl {i _ ljVa 


/ii ? ,Vl (l — A'oliVj! 




We note 1 1 i. : i i in this special case, the denominator is the integr al of a beta function 
which admits an analytic solution, hi ■j.i'Ejeral, however, quadrature techniques must 


be employed to approximate the integral. 

For a fair coin with ~ , the posterior densities associated with various 

realisations A r i and ,N { > are plotted in Figure 4.12. It is fii*st observed that Bayesian 


inference yields a posterior density with just one experiment, whereas frequentist 
analysis would specify a probability of either 0 or 1. It is also observed that, the 
variability of ?r(g|t.O decreases as N increases. Finally, the manner in which the 
data informs the density is illustrated by comparing the results with 5 Heads and 


9 Tails, which lias a mode of 0.36, to those of 49 Heads and 51 Tails, which has a 
mode of 0.495. This ilHisI rales that the method is achieving the goal of having the 
data inform when there is no prior information. 


We next illustrate the effect of a poor choice for the prior density. For the 
same fair coin (fji> — ^) 1 we consider the choice. 



a\/27T 


with ft = 0.3 and it = 0,1. We cannot analytically evaluate the denominator in this 
case 7 so we instead employ Gaussian quadrature. As illustrated in Figure 4.13, even 
for a realization of 30 Heads and 50 Tails, the mode of the posterior is still smaller 
than cji j = 77 but is significantly better than the result for 5 Heads and 5 Tails, This 
illustrates the manner in which a poor informative prior can have negative impact 
for a large number of observations. Hence if the validity of an informative prior is 
in doubt, it is recommended that a noninfor illative prior be used instead. 


i i-tead. a Tails s Heads, 9 Tails 





Figure 4+12 + Post eri or densities associated wi th a noni nformativ e p ri 0 r Jo r three 
rr.alizdiio-ns of the coin toss expe ‘rimenL 
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5 H cuds, 5 , . ill:; 



50 Hc-iids, 5C Tails 



Figure 4,13. Poxteiior densities tissocMtnd with a poor wfoTwiotivs prior for two 
realizations of the coin toss experiment. 


C o nj ugate P r i or s 

Definition 4.67 (Conjugacy). The property that the prior and posterior riistri 
butions have the same parametric form is termed conjugacy. When this occurs, the 
prior is termed a conjugate prior for the likelihood 7r(t?|^), Parameters in the 
prior relation are often termed prior hyp e rpam m e te r$ to distinguish them from the 
model parameters q. The corresponding parameters in the posterior relation are 
called post e ri o r hyperpamme tei s . 

The use of conjugate priors, when possible, is advantageous since closed- form 
expressions for the posterior are then available. This will be used when estimating 
densities for measurement errors in Chapter 3. 


Example 4*68, Consider the binomial model 


N 

*(v\q) = q h 1 ( 1 - r;) V _jV * , jVj - ^ U; 

i = 1 

used for the likelihood in the coin toss Example 4.66. We observe that if the prior is 
parameterized similarly, the product of the prior and likelihood will bo in the same 
family. Specifically, we take tto(^) to be a beta density with hyperparameters a and 
8 so that ttq (^) <x r/ a 1 (l — q) i 1 , as shown in Definition 4_1fi. It. then follows that, 
the posterior density satisfies 

w(?|v) « A 1 (i - '/) :V “ jVi </ r± “ T (i - 

= gAfl+«-l (1 _ ^N-K+fi- 1 * 


so it is a beta density with shape parameters :\\ + r> and A — .Yj + /3. [’lie beta 
prior distribution is thus a conjugate family for the binomial likelihood. 


Example 4,6th Here we consider normally distributed random variables with known 
mean ft and unknown variance cr - . As detailed in Section 4.3.2, the likelihood of 
observing v = fui , , , . , r. : TL iid measurements under these assumptions is 


*(fk ) 


i 


(27TfT^ 


’€■ 


f7' 
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where 1 the sum of squares error is 


SS 


- *o a - 


j = i 


This likelihood is in the inverse-gamma family, defined in Definition 4,14, so the 
conjugate prior is cx . The posterior density can then be 

expressed ns 

7r(^'|fj) dc J 

DC (<r s )-t Q 

_ ip2y{a+l+nm e -i.p+SSmi<r 2 


so that 


(7 ? \v -- In v-ga mi r la ( c l I nf 2 , /3 H- 5 5/ 2 ) , 


As shown in Definitions 4.13 and 4.14 , \f X Ganima(&. ,S), then Y = X 1 ^ 
Inv-gamma(a,/?). This equivalence can be exploited so that the MATLAB com- 
mand gamrnd.iii can be used to generate random numbers from a gamma distribu- 
tion which can then be used to construct random values from an inverse- gamma 
dist ri but ion. 


4.9 Notes and References 

This chapter provides an overview of statistical topics that play a role in uncer- 
tainty quantification, and we necessarily leave details to the following references. 
The text [62] provides a very accessible introduction to probability, point, and inter- 
val estimation, hypothesis testing, analysis of variance, and linear regression with 
clearly stated definitions. The texts [112, 171] are also excellent sources for ob- 
taining an overview of probability and statistics at an upper undergraduate level. 
The book |96] delineates the difference between estimators and estimates by using 
different notation and is an excellent source for details regarding linear regression. 
Finally, [81, 82] are classics in the field of probability. 

There are a number of excellent supplemental texts on random processes and 
Markov chains, including [99, 121. 126, 131), 158, 184, 251 . Additional theory, ex- 
amples* and numerical algorithms for random equations and SDEs can be found 
in 9L 138. 186, 231]. We note that due to the mathematical nature oft he underly- 
ing framework required for SDKs, these latter texts also provide a measure theoretic 
framework for random variables and other concepts discussed in tins chapter. Ad- 
ditional details regarding a measure theoretic basis for aspects of this material can 
be found in [36] . 

The reader is referred to [56. 224] for introductory concepts and examples 
regarding Bayesian analysis and computing and [84, 92 1 for a more in-depth treat- 
ment of Bayesian inference arid MCMC techniques. The text |128] provides an 
introduction to Bayesian inference in the context of inverse problems. 
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4.10 Exercises 

Exercise 4.1, Use the definition of the mean and variance to prove the relations 
(4,12) and (4.16) when n — 2, 


Exercise 4.2. Let. X ^I4(a^b) he a uniformly distributed random variable. Show 
that the mean and variance are 


E(X) = 


a + h 
2 


var(X) 


(ft ~ *) 2 

12 


Exercise 4 + 3* For z € [ 1, 1] and a’ € a. 6], show that the function / given by 



a + b 

~ 



(i. 

—z 


is a one-to-one and onto mapping from [ — l t lj to \a._ 6| 


Exercise 4.4, Let Y be a random variable and c and d be real number a. Show that 

E (cY I d) = cE{7) I d , 

var (cK 4- d) = va i ■ ( V ) . 

Exercise 4*5, Let Y be a random p- vector, Z be a random n- vector, and A £ ®." x " 
be a deter minis tic and known matrix. Use (4,12) and (L 16) to establish that 

E(AY + Z) = AE{Y) + E(Z), 

V(AY} - AV{Y)A t , 

where V(Y) denotes the covariance matrix for Y . This is Theorem 4.16 in [96], 

Exercise 4.6. Lei Z — U(— 1,1) and X ~ ZYfrt., ft) be uniformly distributed random 
variables with respective means /z* = 0 % fi^ - - (g-L&)/ 2 and variances u' l/3j ^ - 
(b-a) 2 / 12. Use the results r i f Exercises 4.2 and 1.3 to show that 

X = fit + — Z. (4.-12) 

Vt 

In this manner., a uniform random variable on the interval [a, b] can be expressed in 
terms of one defined on 1.1]. This is important since the Legendre polynomials 
employed in Chapter 10 and associated Gauss- Legendre quadrature formulae are 
defined on | — 1, 1 . 

Exercise 4.7. Let fxi x? (#i * # 2 ) denote t.he pdf for a bivariate normal with covari- 
ance matrix V — a~l\ where ./ is the 2x2 identity matrix. Compute the marginal 
density fx 2 ) and conditional density /xdx^C^il^s) where x a is fixed. Compare 
your results with Figure 4.6, 


Exercise 4,8, Consider the Markov chain from Example 4,56, Numerically verify 
that 7T - [l /6 : 1/6 . 1/3 j 1/6 , 1/6] is a stationary distribution. Write a program to 
show that if p° = [1/2, 0, 0, 1/2, 0], then p 3 = — p g = — ■ = p° so that the period 

is k = 3. Because the Markov chain is periodic, ]im Td >cc p 1L does not exist. 
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Exercise 4,9, Consider the Maikov chain with the. 1 : tr ansi lion matrix 


P = 


ii 
l 

3 

n 

i 

u 2 


1 

0 

0 

0 


Write a program to investigate the limiting behavior of the chain., and consider 
the initial distribution p° = [1/4, 1/4, 1/4, 1/4]. Does the chain appear to converge? 
Now consider the initial distril nation p l} — [l t 0. 0, 9 and see if you got the same limit . 
Show that the chain i& periodic, and determine its period. Note that existence of 
a limit with one initial distribution does not establish i Is existence for all initial 
conditions if the chain Is periodic. 

E xe r r ise 4.10. C o nsidev tl u ■ M ; i rk;ov ch ai u with 1 1 ie t r an si t i ( m m ai rix 


P = 


1 3 

IF 3 

1 0 


which is non negative. Show numerically that the chain is irreducible and aperiodic 
and that. P r " >0 for m > 2, as established in Theorem 4.60. Numerically determine 
P m 


i in 


rn #oc? 




Chapter 5 


Representation of 
Random Inputs 


In Section 4.5, we illustrated that solutions of differential equations with uncer- 
tain parameters are random processes, whereas in Chapters 7 and S, we provide 
statistical techniques to construct parameter densities using measured data, We 
detail here techniques for transforming random algebraic and differential equations 
into problems posed in terms of inputs — e,g., parameters, initial conditions, or 
boundary conditions — having densities that are constructed either experimentally 
or determined using the techniques of Chapters 7 and 8, 

To motivate, consider the FDE 


OT d 


( I 


{;r, (j) 


OT 


+ /(f, - I < x < 1, t > 0, 


8t dx V ' ' dx 

T(t, — l,w) — , T(t, l,w) — T,.(w) , t > U, 

T{ 0,x,w} = To(w) , -1 < x < 1, 


(5.1) 


which, as detailed in Example 3.5, models the temperature T of a structure with 
uncertain diffusivity cq boundary conditions and initial conditions 3"i>. In 

general, each parameter will have an associated probability space but, to simplify 
notation, we assume a single probability space {fhT 7 , P). Finally, the response y is 
taken to he the temperature T at, each point (f n ^) in the domain 

There are two problems associated with quantifying the randomness of in- 
puts in (5.1): ft(uJ, x) is infinite- dimensional, and we cannot directly quantify the 
probability associated with events uj E £2. We address the first hi Section 5.3 by 
assuming that random fields and processes can be adequately approximated by 
finite-dimensional expansions. To address the second issue, we illustrate in Sec- 
tion 5-1 techniques to pose the problem m terms of mutually independent random 
variables with associated densities. 


5.1 Mutually Independent Random Parameters 

We consider first problems with p mutually independent parameters Q = [Q T , . . . >Q P ] 
where each parameter Qj(w) : ft — >• E has au associated density PoX^i)- denote 
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the range of by I\ = = C E and lake T HJLi IV Si nee the parameters are 

assumed 10 he mutually independent . the joint density pq(q) : V — ^ R is 

p 

i — 1 


Because every realisation cj £ Q yields a value of the random vector Q in T f we 
can reformulate the problem in the image probability space (1\ Z3(F), p Q rather 
than the abstract probability space (51, T. P). Here J0(T) is the Borel tr-algebra on 
F and pq (g ) dq is the measure of Q, 

Example 5,1, Consider (5.1) with constant diffusivity a(x, w) = a[b/j . We seek 
T(Lx:Q) : [0.7/] x — 1,1 x T — * R, which solves 

6T fPT , , 

~di = a dP^ + °- 

T(t,-1,Q) =T e , T{t,l,Q) = T r , t. > [I, 

T(Q,x,Q) = T 0 , -1 < x < 1, 

where Q • [a, T(, T, . To] is the vector of mutually independent random inputs or 

parameters. 


5.2 Correlated Random Parameters 

The assumption of mutually independent parameters permits the representation 
(5.2) for the joint density p Q (q). This assumption is required for the stochas- 
tic Galerkin and discrete projection methods of Chapter 10 and is necessary for 
stochastic collocation unless the joint density can he directly const rue ted. This 
assumption also underlies many sampling methods and is required to establish the 
Gaussian behavior of responses constructed using perturbation methods. 

Unfortunately, the parameters for even simple physical models are often cor- 
related and hence dependent. For example, Figure 8.12 illustrates that Q and h 
in the steady stale heal model (S.22) are highly correlated, whereas Figure 8.15 
illustrates correlation between 5 and X ll d | in the HIV model (8.24). 

As detailed in Section correlation between parameters is fundamentally 
different from parameter nonidentifiability and it typically cannot be addressed by 
reparameterization of the model. This is easily observed for the heat model. 

One strategy for correlated parameters is to seek a transformation that yields 
a new set of independent parameters* As detailed in [266], this transformation is 
the Cholesky decomposition of the covariance matrix if parameters are normally 
distributed. For non-Gaussian parameter distributions, the transformations are 
typically nonlinear and specific techniques arc determined by the available density 
information. For applications where marginal distributions and a correlation ma- 
trix are provided, but a joint density is not available, one can employ the Nataf 
transformation detailed in [74], This is the method that is presently implemented 
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in the Sandia National Laboratories toolkit DAKOTA for use with stochastic spec- 
tral methods 4 In i lie first step, the M at af transformation Ls user I to pose the 
problem in terms of correlated Gaussian random variables. Second, a Cholesky 
decomposition is applied to obtain a representation formulated in terms of mutu- 
ally independent Gaussian random variables. Wo note that! in practice,, an identity 
matrix is often employed in lieu of a correlation matrix if this information is noi 
known. The degree to which this degrades the accuracy of the representation de- 
pends on the level of correlation. If the joint, distribution is known, a Rosenblatt 
transformation is typically a better alternative [210]. However , this information is 
rarely available for complex problems. 

For problems in which marginal densities and correlation matrices are unavail- 
able. one can sample from the parameter chains constructed using the Markov chain 
techniques of Chapter 8 to construct densities or prediction intervals for responses 
or Qol. Random sampling based on the chain indices has the advantage that h 
eliminates the requirement of mutually independent parameters. 

5.3 Finite-Dimensional Representation of Random 
Coefficients 

We assume that infinite-dimensional random coefficients a( f, a ! ) can be ad> quately 
approximated by expansions of the form 

A r 

ar( t. , x, w) « a (t , x ) + ^ Q n ( w ) $ „ ( t , :r) , (5 . 4) 

TL—l 

where ;*;) — E[a'(f, ir. ai}|, Q — [Q l (w) n „ „ . , Qjy(ui)] :: > M is a vector of mu- 

tually independent random variables, and h L u (7. x) are basis functions. There are 
three significant, challenges associated with constructing these expansions: ensuring 
that the coefficients Q n (a?) are mutually independent, maintaining relatively small 
jY, and determining densities Po Tj (f/n) for each coefficient. 

One approach is to specify appropriate basis functions i) — e.g., splines 

or finite elements and apply the Bayesian techniques of Chapter 8 to construct 
densities for the coefficients Q n , The difficulty is that A r will be very large if repre- 
senting random processes in IRh For example, even n very coai'se grid of ID 

in each space and time dimension will 3 r ieid A r = 10’. 

Kar lumen- Loeve expansions provide an alternative for correlated random pro- 
cesses. Tills technique is also known a& proper orthogonal decomposition (POD) 
and, in finite-dimensional settings, principal component analysis (PC A). 


Karhunen Loeve Expansions 

To illustrate, consider a correlated second-order random field <m(x. ,£j) defined 
for ;■?: e D with mean a(ar) and covariance funct ion C ( -r, y j . as defined in ( 4 .34 ) . 
The Karhunen Loeve expansion of a is 

•DC- 

Ot(x,u) - a(x) + vK4>n( x )Qn( u )- 

n — 1 


(5.5) 




110 


Chapter 5. Representation of Random Inputs 


where X n and cp n arc the eigenvalues and orthonormal eigenfunctions of C; that is, 
they solve the integral equation 


/ C(x i y)^ n {y)dy = X n <f> n (x) (5.6) 

Ju 

for x £ T>. From the orthogonality of 3 it follows that the random variables Q n (u) 
are given by 

Q n (u) = -j= I [a{ar, w) - d(x)] 4> n (x)dx. (5.7) 

\/A„ ./•£> 

Hence they satisfy 

E(Qj = n . E<Q m Q„) - fimn, (5,8) 

so they are centered and nncor related. 

For numerical implementation, one employs the truncated expansion 


N 

f>{;E,^j) = Ot{x) + ^ \/A n ^ It (;K)Q n (£j), (5.9) 

Vl— I 

where the choice of A is dictated by the decay rate of the eigenvalues \ n . As 
detailed in 1 148. and illustrated in Examples 5.2 and 5.:5. the decay rate of \ n 
is directly related to the smoothness of C and the correlation length L of the process* 
One typically chooses A' so that the sum of the neglected terms is sufficiently small 
compared with the sum of the first A" terms. 

The second i in pi ementati on i ssi le concen i s the spec i fioat.ioi :i o f rat i d: n n ] >aran i - 
eters Q fl . Whereas (5,7) illustrates the statistical structure of the random variables, 
it is not useful for implementation in applications where a(i, w) is unknown and 
must be constructed using statistical model calibration techniques . 

For Gaussian processes, the situation is fairly simple since Q. n are Gaussian 

random variables. As in Section 5.1, we consider <?(w) = Q.v(^)] € T C 

in the image probability space (If ^(1), where Pq{<{) is a multivariate 

normal density with mean zero. Furthermore, since uncor related and independent 
are equivalent properties for Gaussian random variables, <? n will be mutually inde- 
pendent, which is important from the perspective of implementation* 

For a non- Gaussian field a(a\w), uncorrelated does not necessarily imply in- 
dependent, but we can still represent a in terms of N random Gaussian parameters 
i f a ci i mu 1 at. 5 ve c 3 i st.ril n it i oi i F t , fo r a is k i io w n or c ; m 1 ?e a \ j prox i m ated .. I f we define 
7 (^, 0 /) = F~^ (.dfi', w)) t where is a Gaussian random field, then arguments 

analogous to those in (4.11) establish that 7 and a have the same distribution. 
With a truncated Karhunen-Loeve expansion for >. the non- Gaussian field a can 
be represented as 




N 


3[x) + -J>Z<fr n {x)Q n {w) 


Tt = l 


\ 

/ ’ 


where Q n are normally distributed random variables. We note, however, that the 
random parameters constructed in this manner may not be independent, which can 
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be detrimental to implementation. Algorithms based oil mappings of this nature 
are presented in [192]. 

Example !>-2 (Uncorrelated and Fully Correlated Random Processes), For 

iiiieor related random processes, G(x>y) — 6{x — y) and (5.6) reduces to <&*(;&} 

so X n = 1 for all n. Hence the eigenvalues do not decay. Conversely, 
G(x, y) — I for fully correlated processes, which yields 

I 4>-n(y)dy = 

Jv 

In this case, one can specify q>\ (j:) ■ 1 , Aj - length (D), and A n = - 0 for n > 1. 


Example 5 + 3 + We revisit the heat equation (5.1), where cr(^, or) is taken to be a 
Gaussian random hold with mean a (a 1 ) and covariance function C a [x\ y ) . We also 
assume that the random variables T( (w), and To(w) arc mutually indepen- 

dent. 

We employ the Karlnmen-Lofeve expansion 


jY 

(i% Q) = a(x) -h ^ <\ r ,(x)Q Tl (uj) (5.10) 

n=i 

to approximate a. The coefficients a n (V) — \/X^d\ n (x) are specified in terms of 
eigenvalues and eigenfunctions associated with the covariance function. 

To illustrate the analytic computation of A** and 0n for a special choice of 
t ;0 va ]' i hi i.c :i : fm.1 CT. i 0 n , we cons id or 


C(x, y) 



\x-y\/L 


(5.11) 


where the factor ^ normalizes the integral of C to unity so that it is 
The resulting integral equation is 


a density. 







Results in 94] can he used to establish that the even and odd eigenvalues and 
eigen ft 11 kcti u ns are 


A 


1 


n* r , 


and 


(3-) - 


1 I L 2 rfi 

J ' 1 I 1 31 H 31. 

,/i + 3in if r 


i ^ n-wid : 


1 




1 + L -<« 

sintr^ :c) 


I _ 


\ 






where 7 ^ and are ordered solutions of the transcendental equation 


|1 - L’h „ + tiin(T; reriJJ )] = 0. 


(5.12) 
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Note that solutions r) Uii ^ n and to (5,12) can be easily computed by plotting the 
fi m eti oris to obt ai n initi al vah ie,s for root- find! ng algor ith ms. In ^'nci ■ m I. si it ■] i closed- 
form eigenpaii solutions cannot be computed and one relies instead on numerical 
solutions. 

The eigenvalues obtained with various correlation lengths L are plotted in 
Figure 5.1. It is established in Exercise 5.1 that C(x,y) limits to the Dirac density 
S(x — y) in the limit L — > 0. and this is illustrated by the choice L - ■ 1 x 10 -6 , which 
yields eigenvalues that are approximately unity. The choice /* 1x1 0^ yields 

one eigenvalue of unity with the remaining eigenvalues limiting to zero* which is 
consistent with the observation that the process becomes fully correlated in the 
limit L — ^ oo- 

Based oil the expansion (5.10), the fin its- dimensional parameter set is 

Q = [Q lJ ...,Q A r,T l? ,T,.,To], 
so p A | 3. and the approximate random system is 

— - — i a* (x.Q)'— J + /(*, a;) , -1 < a: < 1 , t > 0, 

T(t, - 1 , Q) = 7> , T(t, 1, Q) = TV , t > 0, ^ 5,13} 

T(0,a? s Q) = To t — 1 < a? < 1 . 



Figure 5.1. First II eigenvalues for the covariance function (5.11) for various 
choices of the. correlation length L. 


5.4 Exercises 

Exercise 5.1 . Plot the covariance function (7 ( 0 , ?y) ■ ■ -kt t ^ " for various values 

of L t and illustrate that it behaves like the Dirac density C r (U. y) ss 5 (y) for small 
L and the constant function C(Q, y) ^ 1 for large L, For j ■ -f, use the fact that 

Cj(0,y) - |e _jlyl is a Dirac sequence [146] to prove that 6{y) as j — t oo. 


Chapter 6 

Parameter Selection 
Techniques 


This chapter addresses techniques to isolate the set of identifiable or influential pa- 
rameters in models. These techniques are often termed subset selection, active sut>- 
space t or essential suhspace methods. Parameter selection is critical for the model 
calibration techniques detailed in Chapters 7 and 8 and to reduce the dimension- 
ality o i models for uncertainty propagation. For model calibration , unidentifiable 
parameters cannot be uniquely estimated by frequentist or Bayesian inference us- 
ing lion informative priors. For example, we illustrated in Example 3.2 that the 
parameters q - [m, c, k] for the spring model (3.5) cannot be uniquely determined 
using displacement, data, whereas the reformulated model parameters K ■■ and 
C = ^ are identifiable. For models such as those arising in systems biology or 
neutron transport, the number of parameters can be in the millions, so techniques 
to isolate influential inputs are critical to reduce the dimensionality of surrogate 
models used for model calibration or uncertainty propagation. 

The terms unidentifiable and noninfluenticU arc often used synonymously in 
the literature. As noted in the following definitions, however, the concepts differ. 
There are also misconceptions regarding the relation between the statistical con- 
cept of parameter correlation and the system input-output property of parameter 
identifiability; we address this in Section 6.3. 

Definition 6,1 (Identifiable Parameters), Consider the input-output map 

y= f(q) , q = .^J- 

The paramet er set q is identifiable ai <f if /(</} — fiq ) implies that q = q' for any 
admissible q E Qh The parameter set q is identifiable with respect to a space l(q) 
if this holds for all q* ■£ 7(g). We refer to I(q) as the identifiable subspace. The 
unidentifiable parameter space NI{q) is the orthogonal complement 5 of I{q) with 

S be a. Ret. of vectors in an inner product. Apace X. The orthogonal coir p lenient. .S’ — to 
i> is the sot of vectors m X tbai arc orthogonal to all vectors m S'. The space X can always 
be represented as the direct sun of a subspace S and Ltw orthogonal complement S' i which is 
expressed as A' = 6 T 0 A’ 1 . 
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F ig ure C.1 . A! up pi ng V = m for an. (a) identifiable , (b) unidentifiable , n/nrfi 
( c) noninflucnti al param e ter, 

regard to admissible parameter space Q with the Euclidean inner product; that 
is. Q = I iq) & A r / (q ) , Intuitively, a parameter set is identifiable if it is uniquely 
determined by the observations, whereas it is unidentifiable if the same response 
y{q) = yUr ) call be achieved with different parameter values g 1 / <p\ as illustrated 
in Figure 6.1(b), 


Definition 6.2 (influential Parameters). A parameter set q is termed non influen- 
tial on the space A r T(q) if y(q} — yiq' ) < £ for all q and q* C ATT (q). As illustrated 
in Figure 6, 1(c), noniiifluenUal parameters yield responses that are equal to within 
a specified tolerance when evaluated at all values in A /Z(g). These parameters can 
thus be fixed for subsequent model calibration and uncertainty propagation. The 
orthogonal complement is the .space 1(g) of influential parameters. 


Remark 6.3. The space of non influential parameters is a subset of the space of 
unidentifiable parameters. This implies that i:' a parameter is identifiable, it is 
also influential. These hierarchies dictate the manner in which the terms can be 
interchanged, Finally, we note that the concepts of identifiable and influential 
parameters are the same for linearly parameterized problems. 


Example G.4. Consider the spring model 


d 2 z 

+ ** = ,K 
s( 0} = .VD , “(0) 


= 0 


( 6 . 1 ) 


discussed in Example 3.2. For q — | k], the admissible parameter space is Q 

(0,oo) x (D r dc). From the solution -(f) — zq cos(^/ k/rn t). we observe that q is not 
identifiable over all of Q since the solution is constant along lines k = Krn. Given 
displacements £(t), determination of the slope K ih equivalent to specify ing the angle 
6 illustrated in Figure 6.2. Hence the identifiable and unidentifiable subspaces of q 


1(g) — ( 0 — arctau(ft/m) | 0 < i) < it / 2 } , 

N / (g) = -jV = \f k 2 4- \ r > D J „ 


art ■ 
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Figure t><2« Representation of the admissible parameter space Q and identifiable 
and nonide'nf.i fifth le su h spaces / (r/) am!. A / (//) for the spring model {G-l). 


which are orthogonal complements. Hence the direct sum Q — I(q) ffi f is 
simply the representation of the first quadrant in terms of polar coordinates. The 
specification of the identifiable subspace / (</) is consistent with reformulation of the 
problem in terms of the parameter K — 7 ^. 

For this problem, the noninfluential and influential subspaces are the same as 
the unidentifiable and identifiable subsp&oes. For a given value of K, ni and k can 
be fixed at any values which satisfy K - - — and yield the correct displacements s(i) 
for all t. 


The objective of this chapter is to provide techniques that can be used to 
construct these subspaces when the complexity of models precludes sole reliance 011 
expert opinion to determine the identifiable and influential parameters. The relation 
between these concepts and the local and global sensitivity relations in Chapters 14 
and 15 is illustrated in Figure 0,1, We first note that the local sensitivity must 
he zero at a point q* + 7 ^ ( q * ) — 0 , in order for the parameter to be unidentifiable^ 
however > this could be an inflection point, so this condition does not ensure that the 
parameter is unidentifiable. The difficulty with using local sensitivity measure ^ 
to establish identi liability is that it must be checked for all admissible parameters n 
which is typically infeasible. This motivates the linear algebra and statistical tech- 
niques discussed 111 this- chapter and the global sensitivity methods of Chapter 15, 
To simplify the discussion, we focus in this chapter solely on the relation be- 
tween inputs q and output responses y. Modifications to accommodate independent 
variables t. or x are discussed in Section 15,3. 


6.1 Linearly Parameterized Problems 

We consider first the linearly parameterized problem 

V = Aq, (G.2) 

where q € y € I!’ 1 , and A is an n x p matrix. The following examples illustrate 
matrix subspaces that will be used to identify and isolate unidentifiable parameters. 
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Example G+5, Consider the model 

Vi — J — 1 > 

with parameters q = [ 31 , 52 ], The associated linear system is 


Vi 


" 0 

Xi 


<1\ 

V2 

_ y* _ 


0 

0 

X‘2 

x 3 


. © _ 


where rank(j4) ■ 1. The unidentifiable and identifiable subspaces are specified by 
the relations 

r 

1 


NI(q) - Af(A) - c 

m = K(A T ) = C 


0 


, C£K, 


. C £ 


where Af(A) and 'R(A f ) are the null space and range of A and its transpose. We 
note that the latter is the row space of A and that Af(A) and TZ{A t ) arc orthogonal 
complements, Furthermore, we observe that the unidentifiable and identifiable sub- 
spaces can also be specified by J\f{A ! A) and 7Z(A f /l). This is due to tlio property 
that 

A r (A T A) m A r (A) , H(A T A) = 1 l(A T ) 
for all >1 € R nx *; see Fact 4.21 of |1 IS]- 


Example G.G. Example 6,5 illustrates unidentifiable and identifiable subspaces that 
are aligned with the coordinate axes. To illustrate when this is not the case, consider 
the linear system 

y=[2 1] 

Here the unidentifiable and identifiable subspaees are 


€1 ■ 
q? 


jV / ( 3 ) M(a) = C 


H<l) = K{A r ) = c 


. c e st, 


, c € 


We again note that A%4) and are orthogonal complements. Example G.4 

illustrates subspaces that are not aligned with coordinate axes for a nonli nearly 
p arm neteri zed p rob lem . 


Property 6.7. For the linear problem (6,2) > the unidentifiable and identifiable 
subspacejs are specified by the null space and range relations 


NI(q) = tf(A) = N{A T A), 

m = K{A T ) = n{A r A). 




6.1. Linearly Parameterized Problems 


117 


_ i | iT! j j s B J 

The formulation in terms of A A is important since it relates identi liability to 
i he Usher information matrix J 7 A T A and covariance matrix V = .rj' J [74 ,r ,4 ) 1 
defined in Table 7,2. This motivates the observation that the covariance matrix will 
be singular for unidentifiable parameter sets. 


6,1.1 Deterministic Algorithms 

Algorithms to construct the identifiable and unidentifiable subspaces employ QR 
or singular value decompositions to compute the rank r of A along with A’( A ) and 
7t.(4 ,r ). We consider first deterministic representations that can he used when A 
is small to moderate in size; e.g.. n,p < lUQlh We also focus on the case- when 
there are more parameters than measurements, p > n, and A is rank deficient, 
r < minju. ft} . Techniques to address the case n > p are analogous and are detailed 
in the references. 

Singular Value Decomposition (SVD) 

The SVD of A is 

A = UZV T , (6.3) 

where U £ i" x n and V £ are orthogonal and Y £ M" x has the form 

Y=[S 0], where 


<?i 


S - 



, (Ty > fT2 > ■ ■ ■ > iJ r > t . 


(0.4) 


is a diagonal matrix comprised of the n ordered singular values &j. The numerical 
rai ik r is determined by the number of singular values greater than or equal to a 
specified tolerance e. The columns of U .md V are termed the left find right singular 
vectors of A . 

T| 

ie deco m posit ion 


U = \U, 
V = [V r 


U n - r ] , U r E R nXr , V n -r E R“ X(,W \ 

F p _ r ) , y r e R pxr , v p _ T e 


isolates the singular vectors corresponding to the nonzero singular values. Based 
on this decomposition! .4 can be expressed as 

A = U T S T Vj, (6.5) 




<T r 


where 
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As detailed in Section 4.3 of [119], singular vectors in V v r provide a basis for 
A 1 " (-.4) and those m V r yield a basis for 72.(54 r )_ 

Whereas this decomposition can in theory be used construct I (q) = '!Z(A f ) t it. 
is illustrated in [120], for rank deficient Jacobian construction, that singular vectors 
can bo inaccurate if A is close to a matrix of lower rank. This issue is avoided by 
rank-revealing QK algorithms. 

QR Algorithms 

We focus on QR factor iz at Ions fo r .4 J rather than A for two re asm i s : A t g 
0ts within the "tall and skinny" framework of the theory, and we are 
interested in r R{A J ) to Construct the subspace I(q) of idem ifiable parameters. The 
QR factorization of A 7 is 

A T = QR, 

where Q € is orthogonal and R 6 R p>:n is upper triangular. For full rank 

matrices, the first n columns of Q form an orthonormal besis for R ( A 7 j. The i *ext 
example illustrates that this is not generally true for rank deficient matrices where 
rank{A) = rankf dd ) = r < n. 


Example 6,8* Consider the QR decomposition 



0 

0 ' 

0 

1 


l 

75 


1 



I 

75 
75 J L 



' 0 

1 " 



75 


0 

i 

75 J 


QR, 


( 6 . 6 ) 


where rank (A) = 1. We observe that neither column of Q forms a basis for 1Z[A T ) = 
[0, c\ T , ce ® a 


This lias been addressed by rank- re vealing QR algorithms that introduce a 
permutation matrix P which pivots the columns of A 7 so that the matrix R in the 
resulting QR factorisation 


\ T P = QR = Q 


Rn 

0 


^12 

Rt2 


has an r x r upper triangular block Rn whose diagonal elements are nonzero; see 
Section 5.4.1 of [J}7] or [S3, IDO]. This separates the linearly independent columns 
of R from those that are linearly dependent and ensures that the first r vectors 
"f Q provide a basis for 11(A t ). This can be accomplished using the MAT LAB 
command f Q , R 3 PJ = qr C A ’ ) . 


Example fi.9. For A ^ given in (6-0) the QR algorithm with column pivoting yields 


' 0 l 


0 

1 


~1 

0 

1 0 

, Q = 

-1 

0 

, R — 

0 

0 


so the first column of Q is a basis for 72 (A 7 ). The fad that -Q, — /? yield the same 
result illustrates that the QR factorization is not unique. 
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6-1.2 Random Algorithms 

The SVD or pivoted QR factorizations can be employed to construct the identifiable 
subspace I(q) — 7£( J 4 r ) for small to moderate size matrices .4. The difficulty arises 
for models used for applications such as neutron transport or systems biology where 
n and p can he on the order of millions. For such problems ? one cannot store the 
matrix A or compute the QR or SYD factorizations. This motivates the use of 

.■T"- ^ 

random algorithms to construct low- rank approximations to A or _1 J that facilitate 
S V D o r Q R factor i x ati on of A or A ' . We summa rize br iefly i .ho range- fi nd i ng 
algorithm detailed in [105] as applied to .4 £ E' ; '' x J ' . 

This algorithm is comprised of two 1 jroad comp* merits. 

Stage 1. C Construct a low-dimensional subspace that, adequately approximates the 
action of the matrix when only products y — /by are available. To construct a basis 
for the range of A, we seek an orthonormal matrix Q with r = v(e) columns whose 
r-dimension&l range approximates 7E.(_4) in the sense that 

\\A-QQ t A\\<€, (0.7) 

where || • || is the operator norm. The objective is to make r as small as possible. 
This step is highly amenable to random sampling algorithms. 

Stage 2. Use this low-rank Q t.o efficiently compute SYD or pivoted QR fac- 
torizations of 4 using the deterministic algorithms discussed in Section 6.1 J. To 
illustrate f one could form the matrix B = Q* A whose SVD B = UYlV 1 can be 
efficiently computed. By forming U = QU y it follows that A == WEV^ . Details 
regarding efficient factorization algorithms for this second stage are discussed in 
Section 5 of [105]. 

The heart of the algorithm focuses on the construction of a low-rank matrix 
Q whose range approximates that of A. 

Algorithm 6.10 (Random Range Finder). 

1. Choose t random inputs q\ and compute outputs y* = Aq { which are compiled 

in the n x £ matrix Y . 

2. Take a pivoted QR factorization Y = QR to construct a matrix Q whose 

columns form an orthonormal basis for the range of Y. 

Details regarding the choice of f > r, the effect of the distribution for q, 
implementation when the numerical rank of .,4 is unknown! and error estimates 
for the algorithm are provided in Section 4 of [105]. The use of the algorithm to 
construct hybrid methods for model reduction is detailed in [1 . 

Example 0,11, To illustrate the deterministic and random algorithms for isolating 
and constructing the identifiable subspace I{q) = 'R(A i ). we consider the function 

p 

Vi - sin(2wi*j) 

*=i 
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evaluated at the n equally spaced points U = ■ (i — 1) At, At - ■ i = 1 , - , , , u, on 
the interval [0, 1 j. The resulting linear system is 


y i 


sin(27rf|) sin(27rpti ) 


<i\ 

tin 


sin(2'7r(f t ) ••• sin(27r;>t tl ) 


. <b . 


As we will illustrate, aliasing properties of the sine functions can be used to predict 
the rank of A and its factorizing matrices. This allows us to check the accuracy of 
linear algebra predictions for large parameter and response dimensions p and n. 

Case i: n = p = 5 

To illustrate the effects of aliasing, we first consider the low- dimensional case 
p = « = T In figure 6.3, we plot each column of A as a function of the points i f > 
It is observed that sin(27rfc£. ? -) = 0 for k = 2. 4 and that sinl'dvrf ,? ) — — sin(27r ■ 3tj ) 
sin(2?r ■ bti) for this set of points ti. Since there is only one linearly independent 
column, the rank of A y and lienee the dimension of the identifiable suhspace I(q), 
is one. To illustrate this, we take the SVD and pi voted QK factorizations 

a - u r s r v r T , a t p = on 

and note that there is only one leading diagonal element in S r and R greater than 
tol = 2 x 1 0 ~ 1 . The first column of V T or Q, 

l>(:, 1) = Q(:, 1) - 1-0.5774, 0, 0.5574, 0, -0.5774] r , 

provides a basis for the identifiable* subspacc I[q } = 1 ), 

Case ii; n — 101, p - 1000 

We now turn to the moderate dimension problem where n = 101 responses 
arc constructed using p — 1000 parameters. Because of aliasing, one can establish 



Figure 6.3, Column# oj A versus Iht points t l7 i — 1, . , , ,5. 


6.1. Linearly Parameterized Problems 


121 



Figure 6.4, Aliasing o/sin(27rfc^) for (a) k = 1, 101, 201,901 and (b) k = 1,51 


that sin(27Ttj) = sin(23r- 101^) = sin(27r -201^) = ■ ■ ■ = Ein(2*r ■ 9Glf*), as illustrated 
in Figure 6.4(a)! with similar relations for p = 102, ,,,,199. This is reflected in 
the result rank (A) = 100 returned by MAT LA Id. However, as illustrated by the 
singular values platted in Figure 6.5(a), 49 of the 100 singular values are essentially 
zero, although their magnitude of 10 12 is larger than the tolerance employed when 
computing the rank. This is again due to aliasing of the type illustrated in Fig- 
ure 6.4(b), where it is illustrated that sin(27r£t) ~ sin(2?r ■ 5l£*). Hence the rank of 
A and dimension of the identififibln subspace J(q) arc actually r — 49, The first 49 
columns of Q or V r provide a basis For this snbspace. 

We also employ the random range-finding Algorithm 6 JO with f = 75 random 
parameter vectors drawn from a uniform distribution to construct an orthogonal 
matrix Q whose range approximates 7c. (.4) in the sense of (6.7). The singular 
values of D = Q J A are compared with those computed directly by the factorization 
A = U r S r VJ in Figure 6,5. The absolute differences illustrate the accuracy of the 


re i loin algorithm 



(a) 



Figure 6,5. (a) Singular values of A computed via the factorization A U } S t .V r J 
and of D = Q T A computed using the random Algorithm 6. 10 and (b) absolute 
difference belvjetv, singular values. 
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Case iih n = 101, p — HI 0 

Here we consider the performance of the random range-finding Algorithm 6.10 
when .4 is too large to be stored or factored. For the reasons discussed in Case iL 
the theoretical tank of A is 40. We again employ i — 75 random vectors drawn 
from a uniform distribution to compute Y and hence Q and R. As in Case ii, R. 
has 49 diagonal values that exceed a tolerance of 1U“'\ so the first 45.1 columns of Q 
can be used as a basis to approximate the range of A. Hence the random algorithm 
remains viable, whereas the deterministic algorithms will be infeasible. 


6.2 Nonlinearly Parameterized Problems 

The determination of identifiable and influential parameter subspaces for nonlinear 
problems 

v = fin) i n = [tfi , • - ■ , n P ] 


is significantly more difficult than the 1 linear case for l wo reasons: one cannot di- 
rectly apply well-established techniques from linear algebra, and one must apply 
global rather than local analysis. We discuss two approaches for these problems. 
The first is to employ the global sensitivity analysis techniques of Chapter 15 t.o 
construct sensitivity indices which quantify the influence of parameter uncertainty 
on the response variance. This approach has the advantage that it- requires no lin- 
earization or assumptions regarding monotonicity and \i incorporates information 
about the parameter distributions. However 5 the computation of these sensitivity 
indices can be prohibitively expensive for large parameter dimensions since it re- 
quires quadrature over "< r . In the second approach, one approximates the local 


sensitivities a* - - ^ to linearize the problem at values in the parameter space. The 
deterministic and random techniques of Section 6.1 are subsequently applied to the 
linearized problem. The primary disadvantage of this approach is that it. is difficult 
to ensure global identi liability criteria. 


G.2,1 Variance-Based Methods for Parameter Selection 

To illustrate a nonlinear technique to select most influential parameters, we briefly 
summarize the variance- based global sensitivity analysis method that is detailed 
?n id i I lust rated w ith exam p les in b ect ion 15.1, 

We consider the scalar- valued model 

Y = /(<?)» 

where Q = [Qi, , „ , , € V and Q } arc independent random variables that are 

assumed here to be uniformly distributed on [0, 1 ] so that I" [0,ip J - We also 
consider the second-order Sobol or HDMR expansion 

p 

/(<?) = /<> \ '^fdqd \ E /y (*>%')' 

*=1 
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wlltTC- 



/s(fli) 


Jr v - 1 


f{q)dq^i - / 0 , 


( 6 - 8 ) 


Qj) = / - fMj) - Jo- 

Jrp - 3 

Here P' -1 = [0, J | r ' 1 , = [0, 1 1 " ' . and the notation q^ denotes the vector 

having all tlie components of q except those in the set i. 

The first.- and second-order Sobol indices are 


Si - 


Di 

D 


S .- D « 




D 


> ij - 


where the total variance D of the response Y is 


D = 


and the partial variances are 




The total sensitivity indices 



2 {<l)dq ~ /,: 




fij i'li > - 



quantify the total effect of the parameter Qi on Y. 

It is noted in Remark 1-3,4 that the condition w 0 implies that, Q f is 
noninfluent ial and can be fixed for model calibration and uncertainty quantification. 
Hence the total sensitivity indices can be used to establish the set of influential and 
non influential parameters for nonlinear models, The extension of these relations to 
general densities and complete Sobol expansions is detailed in Section 15 . 1 . 2 , 

Whereas the Sobol indices S^, and S7; provide 00 inprchoiisive measures for 
quantifying the influence of parameter uncertainty on the variance of the response, 
their compulation can he prohibitively expensive for large parameter dimensions due 
1 .0 tl i.e qu ariraT.i \ 1 e r eq 1 1 i red to e va] 1 1 ate fa , /; ( ) an < 1 f;j ( , <j/j } in (G.8). \ f et t ] ods 
based 011 linearization of the problem provide an alternative technique to isolate 
and quantify influential parameters at significantly reduced computational cost. 


6.2.2 Parameter Selection Based on Model Linearization 

All linearization techniques employ either analytic or approximate values for the 
sensitivities evaluated at values in the admissible parameter space. We 

first consider tin? ™ x p sensitivity matrix defined componentwise by 

XiM) = 

Ut]j 
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wnere q = [q l 
matrix i.s 


ql q;\ is a nominal parameter value. The p x p Fisher information 


J 7 


A' r A\ 


For moderate dimensions n and p. the deterministic factorization techniques de- 
tailed iti Section 6.1 provide measures to quantify local parameter ideut i liability 
and relative influence in neighborhoods of q*. The performance of parameter selee- 
tion algorithms based on .Y and J' are illustrated for the HIV model (3,15) in 23]. 

To provide more global techniques to quantify parameter ideni inability and 
influence! one can evaluate the sensitivities at random parameter values q l specified 
by This provides the basis for the parameter selection methods employed 

in [1, 66]. To illustrate for the n = 1 response, one employs r parameter realizations 
{i7 c }£_i to construct the p \ k sensitivity matrix 




(6.9) 


where V/(§ T ) — [^(q T ), . . r ^ r The algorit hm in |1| constructs a QR fac- 


&q 


torization of *Y and employs the random range- finding techniques outlined in Seo 
tion 6.1 to isolate and characterize ihe subspace of influential parameters, 1’his 
algorithm is illustrated in the context of a neutron transport model employed for 
nuclear reactor design. The algorithm in 66 employs an SVD of A to identify an 
active subspace for the parameter set which is then used to construct a kriging- 
b ri.se d response surface surrogate model. The approach is illustrated in the context 
of a heat transfer model for a 3-D turbine blade with interior cooling holes. 

The efficient evaluation of the gradient for multiple parameter values comprises 
a critical step when simulation codes are computationally expensive and p is large. 
For some applications, adjoint methods or automatic differentiation can be used 
for gradient computations. For highly nonlinear or complex codes, however, finite 
difference approximations arc required to approximate the partial derivatives. For r 
parameter realizations, brute force computation of X in (6+9) using finite differences 
would require 2rp function evaluations, which is often prohibitive. This can be 
reduced to (p + l)? r using the Morris sampling strategy detailed in Section 15,2, 

The objective of Morris screening algorithms is to use the difference relation 


j _ /QC + A ej) - f(qi) 


A 


to construct global sensitivity measures 


( 6 . 10 ) 




j=i 


f* y 1 

^ ^ — T (df (<?) - fii) , ih = ; S d t > 

?=i j=i 


which efficiently quantify the relative influence ef inputs in high- dimensional prob- 
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lcriis. The steps! zc A is chosen from the set 


A E 



where £ denotes the level. and is. a vector of zeros with one in the i lil component H 
Due to the magnitude of A, the difference relation (6.10), which is termed the 
elementary effect, is a very coarse approximation to the local sensitivity. Hence the 
elementary effects can be used to rank the relative iiiflucnee of parameters but not 
i c > n I ve fi ne-sf 1 a! e g rm lie! it 1 >u 1 1 avi ( tx . 

For each index j = l r . . , r, one randomly samples a seed value r/" from p^iq) 
and then ran do ml y specifies the p + 1 parameter values, required to approximate 
the p elementary effects, using the random orientation matrix 


B K = 


Vi i,i ( f + “ [(2 B - Jp+up) D* I J P +i . 


p 


P 


Here D* is a p x p diagonal matrix whose elements are randomly chosen from the set 
{—1, 1 } and the p x p matrix F * is constructed by randomly permuting the columns 
of a p x p identity matrix. 

The sensitivity matrix X can be approximated in a similar manner if one 
employs a smaller stepsize A. 

Morris screening can thus be employed two ways for parameter selection in 
nonlinear problems. The first is based oil the use of the random orientation matrix 
B'" : to coarsely approximate the partial derivatives in the gradient relations (6.9). 


Alternatively* it is illustrated in Sections 15.2 and 15,3 that p* and a can often 
be used to rank the relative influence of parameters Q t at a fraction of the cost, 
required to construct ihe Sobol indices Si, and S ^ . However* tl lc tradeoff is a 
characterizat ion that can miss global properties of the nonlinear input- output map. 


63 Parameter Correlation versus Identif ia bility 

As noted in Definition 4.2U* the Pearson correlation coefficieni 


Pxy 


cov(X n Y) 
ox &y 


( 6 - 11 ) 


quantifies the degree to which the random variab les X and Y are linearly depen- 
dent. Because pxy 7^ 0 indicates a quantifiable statistical relationship between the 
variables, it is sometimes confused with the concept of parameter ideiitifiability, 
which is a property of the input- out put map Y = Aq or Y = f{Q). This confusion 
is due in part to the fact that pxr = il does indicate a linear algebraic relation be- 
tween the variables, so they are not jointly identifiable. We illustrate the difference 
between parameter correlation and i dent i fi ability in [;he next example. 


Example 6.12. Consider the linear model 


Yi = Q 1 + Q^i 
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with - I . n i , ] - 2 so that Ur; linear system is 


Yi 


1 

1 


Q 1 

Vi 


1 

2 


1 

V* 

o- 

J 


We first assume that. Q 
and covariance matrix 


Here cov(Q l . Q^) = 0,5, so Q y and Q 2 are positively correlated, as illustrated by 
the seatterplot in Figure 6.6(a). Hence there is a statistical relation between the 
ra ndom \ r a ri ables . 

Because rank (A) = dim (/(<?)) = 2. both parameters arc identifiable. Thus the 
parameters are correlated but. can be uniquely determined from the response. 

We now consider the same problem with cJi ^ JV( 0,0.25) and Q 2 — 
v.1:k :i \ -I'ltir-. i h: 1 1 crp.i il 111 V igorc 6.0(b) . The linear system n: 1 liT ir.se is 


= Q\ t Q 2 is normally distributed with mean /a = [0.0 1 


T 


V = 


' 0.3 

0.5 ' 

0.5 

1.0 


( 6 . 12 ) 


' n ' 


[§ o] 


i 

iH 

0 

1 

r 2 


4 0 


_ q 2 _ 


Thus rank(yl) = I. l(q) = c[0. lj / for c ■£ Ik . and Q. 2 is not identifiable. 


This example illustrates that correlation alone does not establish unideutifia- 
bility. However, scatterplots having essentially no width can indicate an algebraic 
dependence between parameters that could render them unidentifiable. Since scat- 
terplots are constructed by sampling from the joint distribution, this is equivalent 
to stating that the joint, distribution is nearly single- valued or exhibits a paramet- 
ric: functional form. For linearly dependent variables, this is manifested by values 
pxv *= ±1 for the Pearson correlation coefficient (6.11). These functional rela- 
tions need not be linear , however, since dependencies may be multiplicative; e.g. , 




- 2-1 a 1 2 


(a) (b) 

Figure 6.6. (a) Positive coiTelation of Q y and Q 2 for Q [QiiQsl ^ 1^) 

with the covariance matrix (6.12). (b) Linear relationship Q 2 = ^Q\ with Q y 
N( O', 0.25). 




6.4. Notes and References 


127 


0.05 

0.04 

0.03 

0 02 

O.Ot 

0 . 


Figure 6,7, (a) Unidentifiable parameters e {i , and sfj for the material model in 
1 1 LG] and (b) identifiable parameters d> and h from the heat equation m Examples 3,5 



and 8.12, 


Q 2 — Q\> This is consistent with the result Sy. ss 0 for the global sensitivity indices, 
which indicates that variability in the random variable Qi has minimal impact on 
the variability of the response Y , 

Parameter sets can he directly reduced if the functional relation between pa- 
ra meters is evident , as for K — -jf- in the spring model (6,1), However, this is rarely 


the case for complex models. 

One can employ the Bayesian model calibration techniques of Chapter 8 to 
construct joint densities that are used to judge parameter identifiability. However, 
there are two difficulties with this approach. The first is that Bayesian analysis 
will exhibit the problems detailed in Section 8.5 if noninformative priors are used 


for inference with unidentifiable parameters. Moreover, one would like to select- the 
identifiable parameter set before the computationally intense process of Bayesian 
model calibration. Second, it is often difficult to differentiate unidentifiable from 


identifiable parameters based on the width of scatterplots. This is illustrated in 
Figure 6.7 for parameters c and arising in a smart material model [116] and 
and h in the heat model of Examples 3.5 and 8.12. The first model can be 
reformulated in terms of &£r = — £^?, so that the pair is unidentifiable, whereas 

^ and h are identifiable- However, i lie widths of the scatterplots are similar. The 
techniques detailed in Sections 6.1 and 0.2 provide methods to ascertain idem i liable? 
or influential parameter sets before model calibration to avoid these difficulties. 


6.4 Notes and References 


The concepts of local and global identifiability are well established in the differ- 
ential equations and systems literature due to the central role that they play in 
system identification and control theory. Bellman and Astrom provided a frame- 


work for the concept they termed structural identifiability in their 1970 paper 
and Astrom and EykhofT discuss the role of identification For control desiim in the 
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1971 survey paper [16]. The reader is referred to [25]. and the references therein, 
for definitions, and theory pertaining to stability and identification for distributed 
parameter systems. Details regarding the relation between the concepts of identifi- 
ability, controllability, and observability are provided in [253], 

As deterministic and statistical system identification, parameter estimation, 
model calibration, global sensitivity analysis, and uncertainty quantification evolved, 
the terms influential and uncorrelated parameters have been employed by some re- 
searchers in a manner synonymous with identifiable parameters The manner in 
which these concepts relate to but differ from the property of i dent inability arc 
detailed in this chapter. 

Parameter selection techniques for small to moderate, linearly parameterized 
problems are based on SVD or pivoted QR factorizations. The theory and algo- 
rithms are detailed in [63, 97, 100, 119], Random range-finding algorithms, such 
as those detailed i:i |I05], can be employed for large problems where A cannot be 
stored and only the action of A or ,4 f is available. 

Parameter selection for nonlinear problems is significantly more difficult due 
to the lack of a central unifying theory and the necessity of global rather than local 
analysis. As detailed in Chapter 15, Sobol indices quantify the global influence of 
parameters on the response. Whereas this approach accommodates fairly general 
nonlinear and nonmonotonic model behavior as well as general parameter densities, 
the computational cost can be prohibitive for very large input dimensions. The 
alternative is to approximate local sensitivities and use these linearized relations to 
establish influential parameters. To provide more global measures, local sensitivi- 
ties axe evaluated or approximated at points randomly chosen from the admissible 
parameter space, This is the basis for the Morris screening methods detailed in 
Chapter 15 and gradient-based methods of [1, 21, 211]. These pseudoglobal 

methods will typically be ineffective for problems whose responses vary discontin- 
uous] y with respect to parameters. Rather, they are predicated on the physically 
and theoretically motivated observation that responses often become increasingly 
smooth for very large parameter dimensions [47]. 

We have omitted a large literature on statistical methods for variable selec- 
tion in high-dimensional problems. The survey paper [80] provides an overview of 
techniques in this category. 


6.5 Exercises 

Exercise 6.1. Verify that the concepts of identifiable and influential parameters 
are the same for linearly parameterized problems. 


Exercise 6,2, Compute the SVD and pivoted Qfl expansions for the matrix A in 
the 1 inear system 


y=[2 J 


<h 


f l2 


Show that the identifiable and unidentifiable subspaces are the same as those given 
in Example 0,5, 
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Exerci&e 6 + 3 + Consider the linear fund-ion 

y = Qi I 

where Q — [Q \ Qi | ^ Air/. V) with 


Plot a sc&tterplot of the parameters Qj and to illustrate the correlation. Now 
generate N - 100 synthetic data values by taking h = X and Xi — ih. i - 1, . . . , A\ 
and sampling from N{q, V) to construct corresponding values yt . Using the theory 
of Section 7.2 , compute a least squares fit using q — (X T X) 1 X T y, where 


X = 


1 

Xi 


1 


T 



Plot the synthetic data and least squares fil in the same figure.. Finally, compute 
the condition number of X T X to establish that both parameters are identifiable. 


Exercise 0.4, Consider the steady state heat model 


d 2 T* 

da:' 2 


2(a + t) ft. 


ft 


■ft k 


\-Ux) - ? 


$ 

1°) - T 


-y-(Li = ylTumb - 7*(L)] 
rm k 


detailed in Example ;b- r ). For fixed thermal conductivity k, use the techniques of 
Section 5.2 to establish that the parameters $ and h are identifiable. 





Chapter 7 

Frequentist Techniques 
for Parameter Estimation 


The differential equation models of Section 3,2 can he classified as ODE systems 

du 


dt 


I , v/(/ tl ) - WO , u(t.q) G E 


N 


(7.1) 


//(*>$) = Cu(t,<i ) , c e \\z 


v X N 


stationary PDEs 


A' («, q) — F(q) , x€V ; 
B(u,q) = G(q) , x G &D, 
y(i, q) - Cu(x, q), 


(7.2) 


o t ( 1 vo ) n tiou ary P DEs 

chi 


— = (j) + F(q) . X € X>, t € [to, :5c), 


B(u, q ) = G{q) 

: r i) - !{tj) 


. xe dV, t, G |f U; oo). 
. X G V. 


(7.3) 


Here y and q denote observations and parameters and A f, F. B. and C denote dif- 
ferential operators „ source terms, and boundary conditions. 

Addit tonally t we considered algebraic models 


Alfa = F(q). 

If A (q) g lf€ T ‘ " ,,J is invertible, we can represent the n observations by 

y(<j) = "(>i) = 


(7,1) 


(7.5) 


Linear regression is a special case in which the parameter dependency is linear, so 

y(<t) % X q. 

For fjePJe R n “ p is termed the design matrix. 


' 
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In the statistics and inverse problems literature, the observed model response 
or Qol is often formulated as 

V = f(x> q), (7.6) 

where x are independent variables — e.g. t f or x — or other known inputs, [u tin? 
statistics literature, x aTe also referred to as explanatory or regressor var tables. 
The function / generic-ally denotes the map from the independent variables and 
parameters to the response. We assume that / is fixed and known in the sense that 
there exists a unique modeled response. For nonlinear ODE and PDE and algebraic 
models, however, one can rarely obtain analytic solutions and hence explicit formu- 
lations for /. Hence for most problems, we rely on numerical approximations for /. 
Finally, we note that parameters are often denoted by if in the statistics literature. 

Throughout our discussion, we assume that we have observations (Xijtv), 
i =- 1 , . . . , n, where the measured quantity of interest v. is corrupts l by measurement 
errors e,i so that 

Ui = f(Xi- q)+£i , i = 1, »■ (7.7) 

The mathematical inverse problem associated with parameter estimation can then 
be formulated as follows: given these noisy measurements, determine q in a stable 
manner. The associated statistical inverse problem — sometimes referred to as in- 
verse uncertainty quantification — is to additionally quantify uncertainties associated 
with q due to the measurement errors. The assumptions required to approximate q 
anti quantify its uncertainty define frequentist and Bayesian techniques for param- 
eter estimation. 

For sensitivity analysis and uncertainty propagation, the specific roles of the 
independent variables are typically of secondary importance and we are instead 
interested in how the model solution varies as a function of t lie parameters or 
inputs r/. This is facilitated by the representation 

Vi = fi{g)+ H , i — I — n, (7.8) 

where fi(q) G P'"' denotes the observed model response and Vi E again denotes 
measured data. For the models (7,1 ), (7.2), and (7.1), the model response can be 
expressed as n X v vector 

f(q) = [/(£i . q ), . . . . f(t n , g )] S . Evolut ion Processes, 

f(q) = [f(x\ Ji q)^...,f(x m q)\ , Stationary Processes, (7.9) 

/W) = f/i($) fnUi)f , Algebraic Models, 

Hence the dependence of the observed model response on the independent or regres- 
sor variables is suppressed in the notation f(q }. Kearlere are referred to Section 3,4 
for discussion regarding alternative notation used for parameters in various disci- 
plines. 

For evolution models, we will have v > l experimental measurements and 
model responses at each time tj. j = 1, . . . , n. For stationary processes and algebraic 
models, we consider scalar measurements and model evaluations, so u - 1- 
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7.1 Parameter Estimation from a Frcquentist 
Perspective 

We recall George E.P. Box's quote "Essentially, all models are wrong > but some are 
useful. 11 page 424 of [38]. Thus the mathematical models will exhibit model errors, 
which we collectively denote by vector S = [^i, . . along with measurement 

errors. To accommodate these errors^ we consider statistical models of the form 




(7.10) 


where T = [Ti, . , , . T n r is a random vector whose realization v = [ui, , . , ,v n ] r 
is comprised of measurements from an experiment. Measurement errors are repre- 
sented by the random vector c = [ei, . . . , £ n ] , and errors resulting for a specific 
experiment are denoted by *- — [^ T . . . , f TI ] . 

As detailed in Section t.S.l, a basic tenet of frequentiat inference is the as- 
sumption that parameters are fixed but possibly unknown. Hence go represents 
the true but unknown value of the parameter .set. that generated the observations 
v - [uj , . . . , We emphasize that since go is not a random vector, the model 

response /(go) is a deterministic quantity. 

If the quantification of modeling errors constitutes one of the goals, then it 
is necessary to consider the statistical model (7.10) and characterize the modeling 
errors in an efficient and statistically consistent manner, as detailed in Chapter 12. 
For many applications, however, the modeling anil measurement errors can be col- 
lectively quantified by the random vector £, in which case one would employ the 
statistical model 

T = f(<io) + s (7.11) 


in which errors are additive. 

Tb construct likelihoods in the manner detailed in Section 4.3, we typically 
assume that the random variables are unbiased and Lid, which is often not the case 
if the)' are comprised of both modeling and measurement errors. For example, we 
illustrate in Chapter 12 that residuals for a structural model are highly dependent on 
the magnitude of v even though the model is providing an accurate fit to measured 
data. Hence for some applications, the statistical model 


T, = /t(go)(l I Ei) , j = 1 ,...,71, (7.12) 

with multiplicative errors may be more appropriate since var(T ? ) will depend on 
the magnitude of fi(q □). 

The goal when calibrating models is to determine parameter estimates g so 
that the model response /(g) fits the data in some optimal sense. We showed in 
Section 4.3 that this can be achieved by constructing an estimator q that estimates 
go In a statistically reasonable manner. 1 It was demonstrated that OLS estimators 

Qols ~ argmin WpO - Mg )] 2 (7.13) 

i £Q 1=1 

"The notation y for the estimator is not universal, and many texts do note the estimate by </. 
Hence care :i lust be taken to establish the convention employed in a specific i;ext. 
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and maximum likelihood estimators bo th achieve this goal and are equivalent for 
certain assumptions regarding the distribution of errors 


Remark 7, 1 , Because the estimator q is a random variable or random vector, 
it has a mean, covariance, and distribution termed the sampling distribution; see 
Definition J .38. We will show that with appropriate assumptions regarding the 
distribution of E(rj) = go and the covariance will quantify the variability of the 
errors. Pint her more, confidence limits for the sampling distribution can be used to 
quantify the accuracy of the estimation process. 

What the sampling distribution does not do is provide a distribution for the 
model parameters since is not a random variable in frequentist inference. We will 
illustrate that, for certain problems, the sampling distribution coincides with the 
parameter distribution constructed using Bayesian techniques. This makes it templ- 
ing to propagate the sampling distribution through the modal, using the techniques 
of Chapters l J and LG, to quantify the model or response uncertainty. However, 
this is problematic for two reasons. The first is that there is no convergence theory 
specifying an asymptotic relat ion between the sampling distribution and parameter 
distribution which relies on Bayesian assumptions. Second, the sampling distribu- 
tion is Gaussian, which limits its accuracy for quant i lying non- Gaussian parameter 
disl vibiil i ons. Hence this approach should be avoided unless at Idit ional analysis 
indicates an equivalence between the two distributions. 

There are two alternatives. From a frequentist perspective, one can assume 
parametric forms (e.g., Gaussian or Johnson distributions) for the densities asso- 
ciated with model parameters and estimate the augmented parameter set using 
moment or distribution matching techniques [156, 157, 243 . For model responses 
of the form (7.3), this requires that errors ty be characterized from independent ex- 
periments, We do not- provide further details about this approach but rather refer 
the reader to the cited references. Alternatively, the Bayesian techniques detailed 
in Chapter 8 can be used to construct parameter densities and moments that can 
be directly propagated through models. 


Tl is estimators q can be determined explicitly only for linear parameter de- 
pendencies. Whereas applications such as convolution models for acoustics, or image 
processing and X-ray tomography yield linearly parameterized models, general mod- 
els typically exhibit a nonlinear dependence on q. To illustrate the derivation of 
relevant theory, wo consider the linear regression (linear parameterization) problem 
first in Section 7.2. We return to the general problem posed here in Section 7.3. 


7.2 Linear Regression 

We illustrate here fundamental results regarding linear regression to motivate cor- 
responding theory for the nonlinear least squares problem (7, 13). Additional details 
can be found in Qfi . 

We consider the statistical model 


T = X qo + 


(7.14) 




7.2. Linear Regression 


1.35 


where T = [T i . . . . , T ri | J a ud t = f :j , , . . ,s Tl | j art 1 random vectors and llit! u x p 
design matrix X is considered deterministic and known. We let tfo denote the vector 
of true but unknown parameters and let v = [uj, . . . 5 Un] r denote realizations or 
observations from an experiment in which the realized errors are e = [ci t . . . , c„]. 
Throughout this discussion, we assume that there are more measurements than 
parameters so that n > p. 

Assumption 7.2. We make the assumption that, errors are unbiased and Lid with 
variance crfj; hence for j = 1. , . . , n, 

(i) K(ei) = 0, and 

(iij var(^) — n-," , covf^^j) = 0 for i j . 

In accordance with frequentist assumptions* the error variance <Tq is assumed fixed 
but unknown. At this point, we make no additional assumptions regarding the error 
distribution. 

Our first objective is to construct unbiased estimators q and *t J for the un- 
known parameters y L -j and 

7 , 2.1 Parameter Estimator and Estimate 

To construct an estimator q for qo, we seek q, which minimizes the OLS functional 

J(q) = (T - *<z) t {T - Xq). (7.16) 

If (7.16) were scalar- valuer 1. we would optimize it by setting the derivative with 
respect to q equal to 0 and solving for q. For vector- valued problems, this is achieved 
using the gradient of J with respect to q. Specifically^ one sets 

V<j v 7 = 2[V,(T - Xf/) T ][T - Xg\ = 0, 

where 

V g (T - Xq) T = -V liq T X T = -X T , 
to obtain the least squares estimator 

4ol* = {X T X)-'X T r. (7.17.) 

I'Lse realization 

qa,s = (X T X)- 1 X T v (7.18) 

is the least squares estimate for the unknown true parameter qo . 

Remark 7.3. Throughout this chapter, we will discuss only OLS estimators and 
estimates. Hence to simplify notation, we will drop the subscript OLS and let 
q ~ <Jols and q = qoLs denote the least squares estimator and estimate. 
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Whereas the normal equations (7.17) provide an analytic minimum for (7, 16), 
they are typically ill-conditioned for moderate to large numbers of parameters ^ 
Hence in practice, it is often numerically advantageous to solve the minimization 
problem (7.16) to avoid inaccurate results associated with numerically solving ill- 
conditioned linear systems. 

7.2.2 Parameter Estimator (Properties 

Result 7.4. The parameter estimator <J has the mean and covariance matrix 

(i) E((}) = </o, and 

(ii) V{q)=4{X T X)-' . 

Relation. ( i ) follows directly from (7.1 7) since 

E(tf) = e[(x t x)- 1 x t t; = ^ 

Hence q provides an unbiased estimate for the true parameter. To establish the 
covariance relation, we let A = 1 X' 3 " and note that 

v(q) - ■ [($ - q 0 )(q - ^) r J 

= E[(qo I As - qo)(qo I As - <j<)) T ] , since q - AT - A{Xq b \ s) 

= XE(tC r ) J 4 T 

= <4(x T x)-'. 

As noted previously, the error variance is assumed to be fixed but unknown. 
Hence to employ (7.19) to estimate the parameter covariance, we must construct 
an unbiased estimator 7r for it/,. 

7.2.3 Error Variance Estimator 
Result 7.FV. The unbiased error covariance estimator is 

ff 2 = — l - — r t r< 
n — p 

where 

R = T - Xq 

denotes i he residual esi imator. 

To obtain this result, we first note that the residual can he expressed as 

k = {l n -H) T, 

where I n denotes the n x n identity matrix and 

II s X(X T X)~ 1 X T , 


(720) 
(7.21 ) 
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It is straiglitforward to show that II satisfies the properties 

II 1 = II (Symmetric) t 
H~ = H (Idempotent), 

(/„ - II) 2 = I n - 1L 
{I n - H)X = 0. 

From (7.14) and (7.22) , it follows that. 

R=(I n -H)E 

so that 

R t R = e T (I n - H)e. 

If we generically denote the ij entry of I„ — H by k;j. the quadratic form (7.23) can 
bi ■ expressed as 

T4 n 

i = 1 j= 

It then follows that 

n ji 

E(R f /?) = 

1=1 j=l 
n n 

■ hi jCov(£j, Sj) , follows from (4,14) with E(^) = E(^) = 0 

i= 1 3=1 
n 

— h^var(c^) ^ Sf independent 

*=r 

— rr^tri /ji — H ) , e identically distributed with variance tjf t . 

Since the trace operator satisfies the properties tr(/l - D) = trul) - tr(S) and 
Irf d/y] = tr(£M). it follows that 

tr (I n tr[X(X T X)- 1 X T ] 

= n - tr [(X T X) -1 X T X] (7:2 1) 

= n - p. 

Thus tr 2 = -X- ft T R is an unbiased estimator for uS ■ Furthermore , we can conclude 
from (7.24) that the eigenvalues of H are D or 1, 


(7.22) 


(7.23) 


Example 7.6. Consider the height-weight data from the 1975 World Almanac and 
Book Facts that is compiled in Table 7.1. To model this data, we employ the 
quadratic relai ion 


(7,25) 


T’i — f/i -I- ^(^i/12) I r/3 {-*^7. / 1 2 ) 2 — £i. 
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1 1 eighl- 

On} 

58 

59 

60 

61 

52 

63 

64 

65 



68 

69 

70 

■ 

72 

Weight 

(lbs) 

1 15 

■ 

120 

1 20 

\2h 

129 

102 

135 

1.39 

142 

1 16 

150 

1 . 5-1 

159 

16-1 


Table 7*1„ Height-weight data from the 1975 H Almanac and Hook Facts [73]. 


where x.; is the height in inches and T* is the corresponding weight.. Solution of the 
normal equations (7.18) yields the parameter values q = [261.88, — 88.18., It. 9ft] 1 . 
We note that the conditioning of the 3x3 matrix X T X iw 6,7 x 10 7 > thus illustrating 
the ill -conditioning of the normal equations, The variance estimate provided by 
(7.20) is -u" — 0.15, which yields the covariance matrix estimate 


V 


634.88 -235,04 21.66 

—235 04 87 09 -8.03 

21.66 —8,03 0.74 


The estimated parameter values, plus and minus two standard deviations, are thus 


qi = 261.88 ±50.30 
q 2 = —88. 18 ± 18.ee 
r & 1 1 .06 I 1.72 


qi C [21 1.43, 312.27 
q 2 G [—106.84. —69.51] 
■7;i G [10.24. 13.08], 


(7.26) 


7.2.4 Sampling Distribution for q 

As detailed in Section 4.2 n the estimator q ha* a distribution, termed the sampling 
distribution, which we will use to construct confidence intervals for the estimation 
process. The assumptions required to specify a sampling distribution are more 
stringent, than those in Assumption 7.2 and require either that errors be normally 
distributed or that samples be sufficiently large that the central limit theorem can 
be invoked for averaged error relations. 


Assumption 7*7, The sampling distribution for q can be directly specified for 
problems in which errors arc iid and c* -- A T {Q, u^), where <7u is fixed but likely 
unknown. 


Property 7-8 [ Sampling Distribution for q). With Assumption 7.7, q has the 
sampling distribution q — A' irjf (X T X ) 1 ). Furthermore, i:' we let 5% denote 
the k' { '' diagonal element of ( A { A") 1 and denote the k ih element of the true 
parameter vector q L , , then jk -- X(qa k ^ cr^fc)- 

To verify tills- property, we note from [06] that because each component qk is 
the lineal 1 combination of independent random variables it follows that q has a 
joint multivariate normal distribution. When combined with the fact- that E(q) = q<j 
and cov(fj) = if follows that q ~ N(qo,<FQ(X T X)~ 1 )- 
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For numerous applications! errors may be iid with variance cfq but not nor- 
mally distributed For sufficiently large sample sizes, asymptotic theory yields a 
result similar to Property 7.8. 


Property 7,9 (Asymptotic Sampling Distribution for q). Consider the model 
(7.1-1) with errors which are iid with variance cy. For sufficiently large n s the 
sampling distribution for q is asymptotically normal, which we denote by 7) ~ 

Rather than provide a complete proof of Property 7.9. we Instead summarise 
the approach and refer the reader to [219] lor additional details. We first note that 
substitution oi (7.14) into (7J7) yields t j — q® — { X 7 A F ) 1 X so that 


Because the first, right-hand side term can he interpreted aa an average, the law of 
large numbers is used to establish that 


1 T" p 

-x T x 4 y, 

n 


where 3^ is positive definite. Since E(^=A t e) = = 0, it follows that 


/n- 


VH? 



1 


— X T £-£ T X 
n 



The central limit theorem, discussed in Section 4.4 n 


is then invoked to establish that. 


I 



X r € 


D 




where Z JV(0 ? so that %fn(q — </o) ^ JV(0, )- Finally, one shows that 

dX 1 X is a strongly consistent estimator of X to obtain the asymptotic result. 

An obvious practical question concerns [lie size ri required to justify using those 
asymptotic results. This is problem -dependent, anti alternative methods n such as 
Bayesian analysis, may be required to establish the normality of distributions when 
sample sizes are small. 


C o n fi den ce 1 1 it or val s 

It was shown in Section 3.3 that chi-squared and f- distributions are required 
to construct confidence intervals. This is established for our estimators in the next 
two properties. 

f \ it 12 

Property 7.10. For <r~ given by (7.20), the random variable p — * ,L ~ s X f * has a 
chi-squared distribution with n — j> degrees of freedom. 
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To establish tills, we note that 

= irs 


— T‘- 


( I n - H)s 


= — {e, UAU^s) , I n — H = UAU t wince symmetric 
= -^(U T E,\C r E) . 


Since tv(i ft — H ) — rank(i n — H ) — n — p, we can express A as 

* [ o' 

A “ [ o n J 3 

where / n _ p is the n p identity matrix. Moreover, it is proven in UCJ] that since 
V 7 is an orthogonal matrix and e A (0, <Tq), then u — £;' 1 e is a vector of A r (0, ff,'; ) 
random variables. Because 


(n — p}3 '' {-u. An) 


n p 2 

i-1 ^ 


is the sum of squares of n p independent N fib 1) random variables, it thus has a 
chi-squared distribution with n — p degrees of freedom. 


Property 7,1 1 r The random variable 


7T 


gfc ~ go. 
V'/h 


has a t - distribution with n — p degrees of freedom . 

To verify Property 7,11 , we note from Property 7.8 that Z 


iT c v'TT 


jV(G. 1) 


n 


tlk-_qn k 

‘h - <l« k JJ, o 

'T('j vV V V U ~ T> 


s/n = p 


= , , Z A r {0, 1) ■ ~ x 2 (n ~ p), 

V vf{n - p) 

hay a t- distribution with n — p degrees of freedojn. 

To construct a {1 — r>} x J0()% confidence interval t we employ the techniques 

of Example 4.33, with 7},. = , to obtain 

1 ■ * oVJ* ? 


P — p.l — a/2 " ^ V ' ■' • ^ gQjfc gfc 4” tn — p,,l —a/2 ' ^ V 1 * l - 
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We then employ the parameter estimate q - - (X T X) 1 X T v and variance estimate 
= —^R T R. where R s= v — Xo, to obtain 

Tt—p 1 M 

<lk ~ irt-ft i-a/2 ■ n vh, <!k + tn-p, i-<a/2 -PVfo ■ (7 .27) 

We note thai this is often expressed aw 

[f/fc ^71 — p,l — q/2 ' S-Esk j tffc T p, I — Cl/ 1 ! ’ ? (r -2b) 

where = tT\f7n, is termed the standard error. To construct (7.27) or (7,28), one 
uses a table of f-distributions or value calculator to look up or compute values 
of in-pj-n /2 for specified values of n. p 7 and a with n — p degrees of freedom. 
We caution the reader that whereas most tables are compiled in terras of one tail 
(1 — o/2)j some provide values For both tails (1 — a). Hence care must be taken to 
employ o consisted with the table. 

Example 7,12. We revisit Example 7b and use the ^distribution t; ■ construct 90% 
confidence intervals for the parameters and q% in the quadratic model (7.25). 

Here we have n = L5 observations and p = 3 parameters, for a — G .05, we obtain 
the value t n - p ^- a /2 = 2.2 from a table of f- values. This yields the 95% confidence 
intervals 

q\ E [206.45, 317,31] . 
q- 2 E [—108.71 j— 67.65] j 
</3 E [ 1 0.07, 13,86]. 

These intervals are slightly larger than those in (7.26) for two reasons: the intervals 
in (7.26) reflect 2n ^ 0-1.45% confidence intervals, and the £- distribution hsus heavier 
tails than the normal distribution^ as illustrated i:i figure 4.3(b). 

The statistical model t estimators, and statistical properties of the linear re- 
gression model are summarised in Table 7.2. This provides motivation and a basis 
for comparison for the nonlinear theory summarized in the next section. 

7.3 Nonlinear Parameter Estimation Problem 

We return to the evolutionary process model (7.1), stationary process model (7.2}. 
and algebraic model (7.4), which exhibit nonlinear parameter dependencies, along 
with the associated statistical model 

T = /fao) + e- (7.29) 

The mode] response* J(qo) for llic Uiree regimes are summarized in (7-9). As 
before, we take q e and let designate the true but unknown parameter that 
generates the response v E M n . As in Section 7.2, we assume that there are more 
measurements than parameters so that n > p. We let Q denote the admissible 
parameter space and C denote the space associated with the estimator q . !3ince 
both specify admissible parameter values f 'Q and Q will coincide for reasonable 
estimators. 
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Statistical Model: 

T = Xq 0 + £ , 

v=Xqo I € (realization) 

Assumptions: E(^) - 0 , ii<l with var{£ 4 ) — 

Least Squares Estimator and Estimate: 

4 = (X T X ) ~ 1 X t T . E(r/) = rj 0 , V (4) = (X T X)- 1 , 

q - (X T X)- l X T v 

Error Variance Estimator and Estimate: R = T — X fj R. v— Xq 

a 2 = —R T R , a 2 = —R T R 
n — p n — p 

Covariance Matrix Estimator and Estimate: 

V(q) - & 2 (X T X r L . V - fT 2 {X T X)' 1 

Sampling Distribution: Requires £* A r (0.cr^) or sufficiently large n 

* q - N(qn, Vq(X t X)- J ) 

* (1 a) x 100% Confidence Intervals: = [(X 7 X) 3 kk 

^ n—p,l —a/2^ V d ^rt— p a l — a/2& v 


Tabl« 7.2. St<xH$ ticftl fitodrl, estimators, and HtfitishCOX properUe. s of Ihf linear 
regression model ,4 s noted in Remark 7.3, rj — q 0 L s and q — q 0 L :5 are the OLS 
estimator and estimate. 


As noted in Section 7.1. the OLS estimate for the scalar case is obtained bv 

■ 

minimizing the functional 


J{*) = 7>* ~ A(V) ) 2 ( 7 ' 3 °) 

i= I 


subject to q E Q, 

The difficulty is that analytic expressions for these minimi zers generally cannot 
be obtained for nonltnearly parameterized problems. Instead, estimates must be 
obtained by minimizing the least squares functional. Rather than provide a detailed 
analysis of the nonlinear problem, we summarize results that are analogous to the 
linear theory and refer readers to [24, 20, 219] for details regarding the nonlinear 
problem. 
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7-3-1 Parameter and Error Variance Estimators — -Scalar 
Observations 

Assumption 7,13. To construct parameter and error variance estimators, we re- 
quire £i to be iid with zero mean and fixed but unknown variance , With this 
assumption, it follow* that E(T*} = and var(Tf) = 


Parameter Estimator and Estimate 

Unlike the linear ease, which can be solved explicitly using the normal equa- 
tions, the determination of an OLS estimator and estimate. 

n 7i 

qoLs = arginiii 2JT< - M f l)] 2 - Uova = argmin >"[*;, - h{q)f , (7.31) 

i=1 j_i 

requires numerical optimization techniques. The restriction q t Q can produce 
constraints that must be enforced during optimization. 

It was noted in Example 3.3 that parameter values for physical or biological 
problems can easily vary over 10 orders of magnitude. The direct optimization of 
(7.30) using standard software will be highly inefficient or fail for such problems. 
To address this n we employ scaled parameters - ■■ where ./ denotes compo- 
nentwise division and 5 is a vector whose components are the scale or magnitude of 
each paraiTieT.ee . Point, estimates for the sealed parameters are then given by 

n 

SWt = argmin 2j[u £ - /;(</,. x s)J 2 , (7.32) 

where .x denotes componentwise multiplication and Q* is the scaled admissible 
parameter space-. We employ (7.32) for physical problems where the magnitude of 
parameters vary significantly. 

Remark 7*14* As noted in Remark 7.3, we will consider OLS estimators and es- 
timates in i his chapter . To simplify notation, we thus take q • ■ qo l s and rj — ij fJA 
for the remainder of the discussion. 


One approach for obtaining least, squares estimates is t;o employ stochastic 
optimization techniques such as genetic algorithms, simulated annealing, and dif- 
ferential evolution [232]. The se techniques reduce the reliance on accurate initial 
parameter estimates and, in theory, provide global convergence. However, their 
convergence rates are slower — they may require infinite i imt ■ for convergence — and, 
because they arc nondcicru iinist ic\ multiple optimizations can yield varying final 
parameter values. 


Alternatively, one can employ gradient- based methods such as the interior- 
reflective Newton, Lcvcnbcrg-Marquardt, or sequential quadratic programming al- 
gorithms employed in the M ATLAS routines Isqnonlin and f mi neon. The effi- 
ciency and success of gradient-based optimization methods are predicated on de- 
termining good initial parameter estimates and being able to accurately determine 
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gradients. The advantage of gradient-based methods is that once they are near 
the minimum* they can exhibit quadratic convergence r; i.tes, which is vastly more 
efficient than stochastic optimization techniques. We note that one alternative is 
to employ t-lie hybrid approaches in which the stochastic techniques are used to 
provide reasonable initial estimates for the gradient-based algorithms which then 
provide fast convergence to filial parameter estimates. 


Parameter Estimator Mean and Variance 

For f lic linear model with design matrix X, we showed in (7.19) that E(g) = go 
and 17(g) = ( Jf^X ) 1 - In the nonlinear theory, linearization about go yields the 
approximate covariance relations 


V(q) » 4 [.T r (g D ),*'(*;,)] 1 ~ <r 2 [X T (q)X(q)] 
Hero <T(g) denotes the n x p sensitivity matrix whose elements arc 

9fi(q) 


-i 


(7.33) 


XikW) 


&qk 


(7.34) 


Sensitivity Matrix Construct ion 


The sensitivity matrix can be constructed using three techniques: (i) finite 
difference approximations, (ii) solution of sensitivity equations, or (iii) automatic 
differentiation. Ideally, one would compare matrices resulting from at least two of 
the methods to verify results. 

The simplest conceptually is to approximate the derivatives using finite dif- 
ference relations 

9fi(q) _ Mq + hi,) - fi(q) 


Xik (<}) 


(7.35) 


Oqk |/»l | 

where hk is a p- vector having a nonzero k tk element, The difficulty is that the 
accuracy of (7.35) is highly dependent on the choice of which also must be 
correctly scaled according to the magnitude of q. Hence the accuracy of results 
should be verified through comparison with the other techniques. 

Sensitivity equations can be constructed using various techniques. In Chap- 
ter 14, we illustrate their formulation using Gateaux differentials. More formally, 
they can be constructed by differentiating the evolution equation ^ = g(t, u(f), q) 
with respect to the components q^ of g, and switching the order of integration, to 
obtain 

du dq dg , , 

"aT “ t> ' < 7 

where u qk = 7^, The matrix component r \\k Uf) — C—. is easily constructed 

once one has numerically integrated (7.36) to obtain {^, q). This approach has 
the advantage that it eliminates the uncertainty associated with choosing stepsizes 
hk to provide accurate finite difference approximations. However, if the original 
system has A r differential equations, the solution of (7,36) will involve N'p additional 
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differential equally us. Moreover, the analytic differentiation of the original system 
1.0 construct the sensitivity equations is often difficult for complex systems - 

For certain prol - Lems, automatic differentiation (AD) codes can be used to con- 
struct the sensitivity equations in a form that can be directly incorporated in ODE 
software. In such cases,, the use of AD software to construct ui e sensitivity matrix 
A'(iy) can avoid file inaccuracy associated with finite difference approximations and 
the potential for errors when formulating and solving the sensitivity equations. 

Error Variance Estimator 

Since the error variance in (7.33) is unknown, we construct a variance 
estimator analogous to that in the linear case. Specifically, we consider the unbiased 
variance estimator and estimate 

v 2 = — j l r 'R. , <T 2 = —!—R t R, (7.37) 

n — p n — p 

where H — /,; (q) and If Vi — fi(q) are the residual estimator and estimate. 
This yields the estimate 

V = <t 2 [X T (q)X{<i)] 1 (7.38) 

for the covariance matrix. 


Sampling Distribution 

To specify a sampling distribution for q, we again require either Assump- 
tion 7.7, which stipulates that errors are iid and c — jY(Q, oyO, or that ^ is sufficiently 
large that we can invoke the central limit theorem in the sense of Property 7.9. This 
( 1 i red ly < > r asy 1 t iptc >t ical ly \ isi ah 1 i sin. ^ th ai 

where the covariance matrix is approximated by (7,38). 

Confidence Intervals 

The construction of (1 — ck) x 100% confidence intervals is analogous to the 
formulation (7,27) or (7.28) for the linearly parameterized model. If wc let denote 
the k th diagonal element of [A t (^)T(^)] - 1 , then the (1 — a) x 109% confidence 
interval is 

tfk fji JaI tt/2^ V 1 I t- ri jy,l Lt / '2 ^ 

where c is given by (7.37). As noted in Section 7.2.4, f-calculators or tables can be 
used to calculate or look up given values of ?i,p, and a. 

The properties of the least squares estimator q for the nonlinear statistical 
model (7,13) are compiled in Table 7.3. These can be compared with analogous 
properties for the linear regression problem summarized in Table 7.2. 


(7.40) 


(7-39) 
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Statistical Model: 

T= /(? u ) t , q € K p , 

v = / ( g o ) I c (re al iz at ion j 

Assumptions: E(ef) - 0 , e* iid with var(e*) = - <7* 


Least Squares Estimator and Estimate: 


n 


n 


0 = argxnin YjT,; - fi{q)] 2 , q = arginin \v>. - /;.(<{) \ ~ 

^Q. i=1 <=1 

.-’■ I . 

Error Var i anee ] '’si imato r and Rst iin ate : R. T — / { rj ) . /i = v — f {//) 




—R T R , a- 2 = — - — 


71 — /J 


n - p 




Co variance! Matrix Estimator and Estimate: ^ (<j) — 

V(fl) - 1 . V = <J 2 [X T ( q )X ( q )}- 1 

Statistical Properties: Requires e* — AKh^) or sufficiently large n 

* <j A r (go,ffo [^(gojA’^o)] -1 ) 

* (J a) x 100% Confidence Intervals: 5 k = {(.A- 1 (^).'V(g)) -1 ]*.* 

*lk A*— p, t — ft/3^ V" J r Ai — p ? ] — c>/ 2 ^ V ,/ 


Table 7.3, 5 F £nfrs*icai model, r functors, find statistical properties of the nonhn early 
parameterized model (7.13) with scalar observations. As noted in Remark 7.14, 
fj = dais and r/ = r/ r:: , L 5 are the OLS estimator and estimate. 


Example 7,15, Consider the spring model 


'£ H- C£ I A ;; = 0, 

s(0) = 2 t i( 0) = -C 


with displacemt lit observations so i hat 


V = 



We showed in Example 3.2 that (7,41) lias the solution 

zif) - 2* _< ' t/2 COS ( - C 2 /4 - f) 


(7.41) 


(7.42) 
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Figure 7,1, (a) Synthetic data and modeled displacement and (b) residuals at 

n — 501 points , 


when r- -4 K < 0, We take A' = 20,5 to be known and let q = C be the parameter 


considered in the statistical analysis. We note that although the model exhibits a 
linear dependence on the states z and z, the dependence of s(t, q) on q is nonlinear. 

To numerically generate synthetic data, we employ Co 1.5 and add noise 
£■ :V(G : <r§ ') _ where tfo 0 . J . The model and one realization of t lie data at n — 501 

points are plotted in Figure 7.1(a), and the residuals are plotted in Figure 7.1(b). 
By construction, the residuals are iid with 94.4% of the values lying with the 2a 
interval indicated by the horizontal lines. 

The n x 1 sensitivity matrix (vector) is 


X(q) 


dy 

dc: 




dy 
7 ~8C 


i T 


(*n , ?) 


(7.43) 


where 


dC 


= e~ ct/2 


r _ sin \/ K — C 2 / 4 ■ f j - f cos f \/A — C*/4 ■ t j 


v iff — C 


(7.44) 


results iroi i ] different iating (7.42). The construction o f A’ (r/) I >y constructing and 
solving the corresponding sensitivity equations is addressed in Exercise 7.1. 
Because we know Cq, we obtain the covariance value 


V = (T," : = (j|j [A:' 7 (g)Af(g) 1 = 3.35 x 10 


-4 


so that tJc - " 0.0183- Since £i ~ A r (0. <7^), the random variable C has the sampling 
distribution 

&~N(C a , <p*) t (7.43) 

which is plotted in Figure 7.2. The parameter estimated by minimizing (7.31 ) for 
the data plotted in Figure 7,1 is C = 1.17!) 2, and the 95% confidence interval given 
by (7.40) is [1.4433, 1 ,5150], 


148 


Chapter 7. Frequentist Techniques for Parameter Estimation 



Uptime C 


Figure 7.2. S ampli iVj dr. n.st iy N ( Cy , ) for O and rfe nsi ty CO nstruc t erf from 

1 H , Cl 0 D xitmi la turns . 


It was noted in Sections 4.8.1 and 7,1 that in frequentist inference, the 9-8% 


confidence interval has the following interpretation in the context of parameter es- 
timation* if the procedure is repeated ( times, 0*98/ of the computed intervals will 
contain the true parameter q$. This is illustrated in Figure 4.11(a). To demon- 
strate for this example, we generated 1U, 000 sets of numerical data using the true 
parameter values Cq and A'(0. f7:j') with <tq = 0,1. For each data set., we 

optimized (7.81) to obtain a point estimate C and corresponding 95% confidence 
interval, In this set of numerical experiments, 9455 of the intervals contained C\y . 


Using the 10,000 estimated values of (7, we used the kernel estimation techniques 
discussed in Section 4J.1 to construct the density which is plotted in Figure 7,2. 
As experi ed, the kernel density estimate matches the representation (7,45) for the 
sam p I i ng dist ribi it. i o n . 


Example 7 + 1 8 « We showed in Example 3,5 that the boundary value problem 


d?T, 2[a T b) h 


dx'- 

dT s 

dx 


ab 


|T,,(x) - Tamb] fl 


¥ 


d.T s 

dx 


(L) - 


llT 

. 1 1 .jin': 


-7.: 


models the steady state temperature of an uninsulated rod with source heat flux $ at 
x — (] and ambient air temperature The model parameters to be estimated 

and st atistically analyzed are q — [4>, ft], where h is the convective heat transfer 
coefficient. 

The rod used in these experiments was aluminum with cross-sectional di- 
mensions a = b = 0.95 cm and length L = 70 cm. The temperature measure- 
ments compiled in Table 3.2, were made at 15 equally spaced spatial locations 
Xj = xc + (i — l)Ax ? where tq = 10 cm and Ax = 4 cm. The observed solution is 


ViUl) = T,{$i,q) - ci(q)e + T ajn i, 
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where 


/ 2\ a ■ b)h 

V oM 


and 


<-i (?) 


$ c' rI '(h +■ /l’ 7 ) 

k'y e~*? L (h — k r -f) + e^ L {h I £ 7 ) 


$ 

^2 (fj) = — + ci (q). 
A’ 7 


We suppress the parameter dependence of 7 to clarify the notation. We employ the 
thermal conductivity value k = 2.37 — reported for aluminum and the measured 
ambient room temperature T ftni f> = 21.29 °C. 

A least squares fit to the data yielded the parameter estimates = — - 18,41 
and h 0.00191 n and the model fit shown in Figure 7.3(a). We note that this 
value of h falls within the range 2.8 x 10 -4 — 0.0023 r .. t ^ jf7 reported for still air. The 
residuals plotted in Figure 7.3(b) exhibit no discernible pattern, thus motivating 
the assumption that the errors Si are lid. We assume that errors are normally 
distributed when constructing a sampling distribution. 


The error variance estimate is a~ = 0.0627, and the covariance matrix, com- 
puted using analytic sensitivity relations, as derived in Exercise 7.4 and illustrated 
in Figure 7.4, is 


2.1034 x 10“- -2.0286 x 10“ 6 1 

v ~ l -2.02s<j x io _a ‘>mn x J ' ( Ab * 

The standard deviations for the errors and sampling distribution are 

tr = 0,2504 , < 74 . - 0.1450 , (r h = 1.44S2 x 10" s . (7.47) 

Since n = 15 and p = 2, the 95% confidence intervals are 

| - 1 8.7233, -18.0967] , [1 .8787 x 1 0 “^ 1 .9413 x 1 0~ : % 


In Example 8 . 12 , we revisit, this example in the 


context of Bayesian analysis. 



Figure 7M* (a) Model fit to the steady state temperature, data and (bj residuals at 
the 15 sjxitial location#. 
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Figure 7*4* Analytic sensitivity values: (a) and (b) ^ 


7.3*2 Parameter and Error Variance Estimators For Evolution 
Models — Multiple Responses 


in this section, we consider the evolution equation (7.1) with u > 1 data measure- 
ments and model responses specified by a u x n matrix C . The statistical model in 
this case is 

T i = f(ti,q oj + e* , j = 1, . . . , n, 
where T* and are random i/- vectors. 

Ass 1 1 1 1 1 , pt io ii 7.17. To accom ni odate t he possi 1 > i 1 i ty 1 1 i at error distr ibi it it n ] s assoc i - 
a ted with individual components of the observations could differ, we let denote 

the fixed, but unknown variance of the error associated with the j th observation. 
These values are compiled in the u x u diagonal measurement error covariance ma- 
trix Vrj — diag[^ ( , * . . , ]. As before, errors are assumed to he unbiased. We 

remind the reader that Vq is fixed but typically unknown. 

The construction of parameter and covariance estimators is similar in theory 
to the scalar case u 1 but is complicated by the coupling induced by the po- 
tentially differing variances of the error components. We provide an overview of 
the estimators, estimates, and sampling distribution for v > 1 and refer the reader 
to [24, 26] for details. 


Example 7*18* It. was noted in Example 3,2 that for vibrating systems modeled 
as a simple harmonic oscillator (3,11), displacements and velocities can be respec- 
tively measured using a proximity sensor and laser vibrometer. If both sets of 
measurements are available, the modeled observations will be 


yi(U,q) 


' 1 o' 



yi(U,q) 


0 1 


Z2{U, 7) _ 


which is just the parameter-dependent states. Given the differing nature of the 
measurement devices, one would expect different error distributions to be associated 
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with the experimental measurements vj and ua - 


Hence we would employ 



rr 


:: 


Qi 

0 


£ 7 


J 


(7.48) 


Example 7*19* For the HIV model {3. ID) of Example 3.3, one can typically mea- 
sure only the total number Ti + Tf of T-lymphocytes and the viral load V. Hence 


10 10 0 0 
u 0 0 0 1 0 


and y(t. q) t ffiU . The j error covariance matrix would again have the structure ( 7.48). 


Para muter and Error Covariance Estimal.{irs 

The OLS estimator and estimate are taken to he 

n 

<1ols = argmin Y [T* - /{/-,, </)l T V f j 1 [T, - f(U, </)], 

i= i 


<ioLs = argmin Y \t% - f[t h r/)J / V a 1 [u € - /(*;, g)], 

i= i 

where V^ -1 weights the response components by the reciprocals of the corresponding 
error variance associated with each component. Since Vq i:- typically unknown, it 
too must he estimated. Motivated by (7.37). the estimate V ^ Vq is provided by 
the relation 

V - r;liag ^ 1 - Yfr - f(ti, ^ 0 Ls)]fy« - /(C</oLs-)t j ■ (7.50) 

Unlike the scalar response relations (7.31) and (7,37), the multipit ■ response relations 
(7.49) and (7.50) arc coupled due to the fact that Vo t 4 eTf/, and hence they musi 
he solved as a coupled system. 


(7-49) 


Sampling Distribution 

To specify a sampling distribution, we need an assumption analogous to As- 
sumption 7.7. 

Assu m p t io ti 7.2(1. Let Sij denote the e r r or in t.l le i th oo mponei ti of T* at l ime 
tf . VVe make the assumption that. ±r.^ A ; (0.f7;j.) so dial ^ A r (G t Vo). For n 

sufficiently large, the central limit theorem can be invoked in the manner detailed 
in Property 7.9 to obtain similar asymptotic results. 

With tills assumption, it is shown in [24, 2b] that 


Qols 


A r (<foi V>) 


XUloLS.V). 
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w] ler-e 


Vr 


/ " 

V =1 


- 1 


in the p x p eo variance matrix and 


- 



3/ 1 diiff) 

b. 

dqi 

5 Ad: ■ ■:? j 

dtjp 


(7.31) 


is the v X p sensitivity matrix at time fi. For implementation ; Vo is approximated 

by 


V 


/ n 

TxJ(qoL.)V J ,Wlo^) 

U =1 


-1 


where (7,51) must be evaluated at each time step. The (1 — a) x 100% confidence 
intervals are 

— — «/2 T ^tc -p<l — -a/ 2^ ^ J: 
where q u ^ s .k is the k th element of q uLs and the standard error is 

SE as v^- 

Here V* is the diagonal element of V. 


7-4 Notes and References 

The parameter estimation i echniques discussed in tins chapter are based on linear 
and nonlinear regression for which there are numerous excellent t exts. The text [96] 
provides a very nice introduction to linear regression and has the ad 1 vantage that the 
authors use different notation to delineate between random variables and their real- 
isations. This is also a good resource for obtaining additional background regarding 
the confidence and prediction intervals discussed in Chapter 9. Asymptotic theory 
for nonlinear regression problems is detailed in the classic book [219]. We refer read- 
ers to [24, 26] for details regarding the construction of estimators and specification 
of sampling distributions for parameters in nonlinear evolution models. 

For brevity, we do not discuss the following topics: infinite-dimensional in- 
verse problems associated with parameter estimation, regular iz alien, or optimiza- 
tion methods for inverse problems, 'She reader is referred to [25 1 for theory and 
estimation techniques for distributed parameter systems and [15> 128. 176, 244, 256] 
for details regarding regularization, computational algorithms, and case studies per- 
taining to parameter estimation and inverse problems. The texts 64. 131, 132. 232 1 
cover a variety of optimization techniques that are appropriate for this class of 
problems. 
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7.5 Exercises 

Exercise 7.1. Consider the unforced spring model (7.41) with displacement obser- 
vations. Construct and solve the sensitivity equations for the damping parameter 
C and stiffness parameter A'. Show that your observed solutions are the same as 
i .hose o I stained by t ] i ffer er it \ ati rig the so I ntioi i y{Lq). 

Exorcise 7 + 2< For the spring model (7.41), approximate the time-dependent sen- 
sitivities using the finite difference relations (7.35) and compare your solutions to 
those obtained in Exercise 7.1 for various stepsizes hx and he- 


Exercise 7.3. Consider the unforced spring model (7 .41) with displacement ob- 
servations and initial conditions z(0) = s( 0 ) = —CL Construct and solve the 

sensitivity equations for the initial condition 2 $. Compare your answer to the solu- 
tion obtained by differentiating y(t, q) with respect to z$ . 


Si 


Exercise 7.4. Compute the analytic sensitivity relations ^ and ^ for the steady 


state heat model in Example 7. IS. Hot your solutions at the points specified in 
the example. 


Exercise 7 + 5+ Use the finite difference expression (7,35) to approximate the sensi- 
tivity relations ^ and ^ for the model in Example 7.16. Compare your solutions 
to the analytic sensitivity relations developed in Exercise 7.4 for various stepsi^es. 
Discuss criteria that should be used to specify stepsizes. 


Exercise 7.6. Repeat the analysis and numerical experiments detailed in Exam- 
ple 7.15 for the spring model 


z I 0,15* + Kz - 0, 
z(p) = -2 , i( 0 ) = Z\ 

with displacement observations. Hence the parameters to be estimated are q 

Exercise 7.7. Repeat the analysis of Example 746 for the steady state heat model 
using the copper data in Table 3.3. Do your residuals appear to be iid? We will 
revisit this problem in Chapter 1 2, 


Exercise 7.8, In this problem, wc will model boat generated during the hardening 
of cement. Data from 104 is compiled in Table 7.4. Here v denotes best with units 
of calories/gram cement and Xi -^4 respectively denote the percentage of tricalcium 
alumiiiatc, trie aid utn silicate, tctracalcium alumiiioferite, and di calcium phosphate, 

(a) Consider firsl the linear model 

T = A\) + I h' } ‘2 ■+■ ,^3^3 i ^ ;ir:: 4 4- £ ■ 


Estimate the parameters, plot the residual, and determine confidence intervals of 
two standard deviations as well as 95% confidence intervals. 
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(b) Perform the mmo analysis uwiii£ linear models that incorporate only x\ as well 
tis x } and $2 ‘ H q\v do your results compare witih those obtained in (a) ? 


Obs. No, 

Xi 



^■4 

V 

1 

7 

26 

0 

GO 

78.5 

2 

1 

2 fl 

15 

52 

74.3 

3 

11 

r> 6 

8 

20 

104.3 

4 

11 

31 

8 

47 

87.0 

5 

i 

52 

6 

33 

95.9 

0 

1L 

55 

0 

22 

109.2 

hr 

t 

3 

71 

17 

6 

102.7 

8 

1 

31 

22 

44 

72.5 

9 

2 

54 

18 

22 

93.1 

10 

21 

47 

1 

20 

115.9 

11 

1 

40 

23 

34 

83.8 

12 

11 

06 

9 

12 

1 1 3.3 

i— ■ 

10 

08 

8 

12 

109.4 


Table 7,4. CJvm.wnh daiu fivm. 1 1 0 4 1 . 




Chapter 8 


Bayesian Techniques for 
Parameter Estimation 


For applications where modeling and measurement errors 5* are unbiased and lid, 
we employ the statistical model 

T( -/<($) + £< t i-l n, (8.1) 

where T and Q are random variables representing measurements, measurement 
errors, and parameters. As defined in (7.9) , fi(Q) denotes the parameter-dependent 
model response. We note that the measurement errors in this case are modeled as 
additive and mutually independent from Q. We also remind readers that calibration 
parameters and observed data are commonly denoted by 0 and y in the statistics 
literature. 


8.1 Parameter Estimation from a Bayesian 
Perspective 

As detailed in Section 4.8, the tenets of Bayesian inference differ significantly from 
the frequentist perspective describee;! in Chapter 7. In the context of inverse prob- 
lems involving parameter estimation, the Bayesian approach can be summarized 
as follows. Parameters are considered to be random variables Q with realizations 
q — ej(w) and associated densities that incorporate known information or informa- 
tion obtained as measurements are acquired. The solution of the inverse problem is 
the posterior density that best reflects the distribution of parameter values based 
on the sampled observations. 

It was shown in Section 4.8.2 that the posterior is constructed in terms of a 
prior density and likelihood- The prior density incorporates any knowledge 

that we have about parameters prior to obtaining observation h v. This could come 
from previous similar experiments or analysis regarding similar models. It was 

i lh i. - i i .= i i i s 1 1 1 1 : i ' if vM'i : ki V i VC ledge is of > |iivs‘ ;■ end if ■ mvijm'V, : l is ' : ■ 1 1 ■ i ■ u> lr-i 1 

a nonin formative prior which is often taken as an improper uniform density posed 
on the parameter support; for example, one would employ ttq(^) = X(0,Qc)(tf) f° r 
I j ositi ve parameters . 
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The I ! kelil lood fui i ct i t >t i tt f t; q ) — L ( q [tr) incorp orates i 1 1 formafcioi i p ro v i dt m 1 by 
the samples and constitutes the mechanism through which data informs the poste- 
rior density. As detailed in Section 4,3.2, the likelihood quantifies the probability 
of obtaining the observations v for a given value q of the parameter Q. Hence if we 
let 7T (q,v) denote the joint density of Q and T, then the likelihood 




</. 

Kc{q) 


is the conditional probability of T given value of Q. 

Once wo have a measurement or observation v = v r ^- the conditional density 


where we assume that 




{ < (fb ^ahn ) 
* (%#) 


*{v 0 bt)= / n{q,Vobs)dq = / ^{Vofn\q)^ 0 {rf)dq ^ 0 , 

J’RF J'Rr* 

is ihe posterior density. The inverse problem in the Bayesian framework can thus 
be stated as follows: given measurements u 0 ^ , find the posterior density ~(q u 0 ^ ) , 
The complete formulation, which the authors of [128] refer to as Bayes ’ theorem of 
inverse pro h I f. ms, can he stated us follows. 


Result 8,1 (Bayes" Theorem of Inverse Problems). We assume that Ihe p 
random parameter variables Q have a known prior density .TfP’r/g which can be 
noiiinfomiative, and we let v^bs be a realisation of the random observation variable 
T. The posterior density of Q, given the measurements , is 




Tr{v 0 i>z\q)iTn(q) 


Triv^lq'^iq) 


( ^ 'ottk ) p q ") •' • !■' ( u i^q 


( 8 . 2 ) 


When using (8.2). one implicitly assumes that observed data is used to con- 
struct- the posterior density; hence we write v ■ v r ^ in subsequent discussion so 
that (8.2) is the same as (4.4 l ). 


8.1.1 Likelihood Function 


The specification of the likelihood function depends on the assumptions made 

regarding t.he distribution of errors. In Section 4.3.2, we showed that if we employ 
the statistical model (8. J ) with the assumption that ■ errors are aid and s* ^ A'(0, a 1 ). 
where a" 1 is fixed, then the likelihood function is 

= L{q, <t 2 \v) = _L_ e -^/ 2 ^ > (8.3) 


where 

SS^Y.Wi- ( 3 . 4 ) 

t= J 

is the sum of squares error. The construction of likelihoods for other error models, 
including multiplicative noise, is addressed in [128], 
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6-1.2 Maximum a Posteriori (MAP) Estimate 

Thc. x posterior density vr ( g 1 1 _ ! ) provides the complete distribution of Q based on the 
observations v. EYom this. point estimates for the para-motor values are provided 
by the mean, median, or mode. The latter is defined as the parameter value that 
maximizes This value, termed the MAP estimate, is given by 

q.uAr = argmax7r(g|tj). 

S i i tee i he r 10 m l a 1 izatio 1 1 oc > nst a [ it tt (v ) does i lot a fleet the maxim izi i lg a rgumei it , an 
equivalent formulation is 


q M AP = argmax7r(u|q)7T(i(^). (8.5) 

For a uniform prior on EL q., t , is thus equivalent to the maximum likelihood 
estimate q A r defined in (4.28). As detailed in Section 4.3.2, one would typically 
employ the log- likelihood function f (q, <j o) in such cases since it facilitates opti- 
mization by eliminating the exponential. 

8.1.3 Implementation Techniques 

The formulation of the inverse problem in the Bayesian framework is concisely pro- 
vided by Result 8.1., However, the implementation of (8.2) is extremely challenging 
if the dimensionality p of Q Is large, as is often the case for physical or biological 
models. As illustrated in the next example, classical tensor ed quadrature rules can 
be applied for low dimensionality; e.g., p < 6. Within the last ton years, signifi- 
cant research has focused on the development of adaptive sparse grid quadrature 
techniques for moderate dimensionality and Monte Carlo techniques for high di- 
mensions. These techniques are discussed in Chapter 1 1. 

Alternatively, one can construe! Markov chains whose stationary distribution 
is the posterior density. We discuss Markov chain Monte Carlo (HI CMC) techniques 
in Section 8,2. 

Example 8.2. Consider the spring model 

z + Cz 4- Kz = 0, 
z( 0) = 2 , i( 0) = -C\ 

which, for C 2 — 4 K < > q lias the solution 

Z(t) = 2t r° tn «) 8 ( v //\ -c*/4. ■ t ). 

We assume displacement evaluation so that y[U, Q) = z(ti t Q). We consider K = 
2U.5 to be known and treat Q = C as the unknown parameter to be estimated. To 
construct synthetic data, we take Co = 1.5 and construct iid errors £* ^ jY{0, <Jq), 
where = 04, 
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For this error distribution, the likelihood is given by (8,3). We employ the non- 
in formative prior ito^) to enforce CJ to be nonneg&tive, The posterior 

density can thus be expressed as 

1 






g SSc.j'iv'v ^ 


where SS (f is defined in (8.4) and S&\ denotes the sum of squares defined in terms of 
the integration variable. The second formulation is necessary to avoid numerical ^ 
iV.::. luali-ai durr ( I ■■ - 4 ; g / J 1 1 1 hr i:- ■ : -[ l : j i - l] ■■ : i l _ : l lj. ■ I ,■ : .piv -x i 1 1 1 ; 1 1 ■ ■ 


the integral yields 


7r(<j\v) 


■■ i — 1 


( 8 . 6 ) 


where Q 1 .. ud respectively denote the quadrature points and weights. 

We generated one sot. of synthetic data i — L , . . ,501, which is plotted in 
Figure N. '.{w } nl iy wii 1: : :io model r 'sp: ):iso /. [ tf ]. Th 1 p: jsl onor dcnsiiy given ;>v 
I S.ti ) is plotted i:i IdgUlr >. I ( b We note Unit i lie MAP ■ -st LmaTe is e- ; , } - 0.1489. 

Since we have employed a noninformative prior, this corresponds to the MLE. For 
the assumed error distribution, it also corresponds to the OLS estimate, 

\\ r e si K 3 wed in Chap ter 7 tl l at t lie C ) I jS esti m at or 1 l as 1 1 u : sail n pi i i ig ■; ] i gt ribi it ion 


— CV; 


OLS 


N{C 0 ,^\X 


(C’o)A J (CVj )] _ ) , 


(8.7) 


where X(C$) is given in (7.43) of Example 7.15, The sampling distribution is 
compared with the posterior density in Figure 8 . 1 (b). We note that they have the 
same shape but the sampling distribution is centered at Cq. 

We will revisit this example in Example 8.7, where we construct 7 t(^|?j) using 
MCMC methods. 



Figure 8 , 1 , (a) Synthetic data V; and model response fiiqn)* (b) Posterior density 
and sampling distribution (8.7). 
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8.2 Markov Chain Monte Carlo (MCMC) Techniques 

The evaluation of the posterior relation (8-2) using quadrature techniques requires 
the evaluation of densities over the region of 1I£ T ' where the posterior is defined. 
For moderate p y this necessitates the use of the sparse grid quadrature techniques 
discussed in Chapter 11, whereas Monte Carlo integration techniques are required 
for large dimensionality p. The difficulty of this apl >roach is exacerbated by the Fact 
that the support of the density is often part of the information that we are seeking. 

An alternative is the following. Rather than using quadrature or Monte Carlo 
algorithms to specify parameter values at which we evaluate the density, we can 
use attributes of the density to specify parameter values that adequately explore 
the geometry of the distribution. This is achieved by constructing Markov chains 
whose stationary distribution, as defined in Definition 4.52, is the posterior density. 
By evaluating realisations of the chain, one thus samples the posterior and hence 
obtains a density for the parameter values based on observed measurements. This 
is the basis for the MCMC techniques employed here. 

hi Section 8.3, we summarize the Metropolis and Metropolis Hastings algo- 
rithms and motivate their structure. The detailed balance condition defined in 
Definition 4,62 is used in Section 8.4 to establish that 7r(g|u) is the stationary dis- 
tribution for the chain. We also discuss convergence criteria in that section. The role 
of parameter kientifiability is discussed in Section 3,5, and the development of the 
delayed rejection adaptive Metropolis (DRAM) algorithm is detailed m Section 8.G. 
This Is the algorithm that we employ in subsequent chapters. The DiflfeRential 
Evolution Adaptive Metropolis (DREAM) algorithm is summarized in Section 8.7 r 
The reader is referred to Section 4.6 for relevant definitions and theory pertaining 
to Markov chains. 


8.3 Metropolis and Metropolis-Hostings Algorithms 

Recall h *om Definit ion 4 . 5 1. ) that a Maikov chain is a sequence of 5- valued random 
variables that satisfy the Markov property that Xt depends only on Xjt_ i . The state 
space in this case is the set. of possible parameter values, so we will be constructing 
chains based on parameters chosen according to the following strategy. 

Strategy 8.3. Consider the parameter q" J € iR p to be specified, 

(i) Take the current chain realization to be X&. j = q h 

(ii) Propose a new value q* ^ J(q* |g* _1 ), where J is called the proposal or jumping 

distribution. The notation indicates that J specifies q * based on the previous 
value q h 1 . and Jftf* 1 ) should not. be interpreted as ;i conditional density. 

(in) With probability determined by properties of the likelihood func- 

tion and prior density, accept q "': be., X^ = Otherwise, take X* = q k ] . 
We note that 1 ) is not. a conditional probability but rather specifies 

the probability of accepting rf generated from the previous value q * L , 

(iv) Establish that the posterior density is the stationary distribution for the chain. 
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8.3,1 Metropolis Algorithm 

We consider first the case when the proposal distribution is taken to be syrometric 
in the sense that We consider two possibilities for the 

proposal distribut ion : 


(8.8) 


■wv i )-jv(4 fc ’,n 

Ilcrc V is the covariance matrix for Q ? whereas D is a diagonal matrix whose 
elements reflect the scale associated with each parameter value. The symmetry in 


the first case follows since 




1 e - mw ' i v ■ - 1 «?- -u*- 1 r 

VWFW T 

\/M 


The analysis of the second choice is similar. We will provide further motivation for 
these choices after we summarize the algorithm. 

Algorithm 8.4 (Metropolis Algorithm). 

1. Initialization: Choose an initial parameter value f/ J that satisfies > D. 

2. For k - 1 s M 

(a) For Z r ’ v _/V (0. l) s construct the candidate 

q* = q k ~ l +Rz, 

where R is the Cholesky decomposition of V or D As specified in The- 
orem 4.23, this ensures that 

q" ~ A V) or ~ A'!//' -1 . X>). 

Because the construction of q x takes into account r/' 1 , this is termed a 
random ■ walk or local Metropolis algorithm, 

(b) Compute the ratio 


n{tf n _ 

7r(g fc_1 [u) 


(S.9) 


(c) Set 

f , with probability a = min( 1, r), 

* | tf* -1 * else. 


That is. we accept rf with probability 1 if r > l and we accept ii with 
probability r if r < 1. 
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We first motivate the choice of acceptance criteria in steps 2(b) arid (<■)■ The 
first observation is that by forming the ratio of the posterior densities,, we eliminate 
the normalisation constant, which is difficult to compute when p is moderate or 
large. Now consider the case of a uniform prior and iid and normally distributed 
errors so that the likelihood is 


t? 


= (2-^w- c SS,/3 ° J ■ 3St « = - AMl a > 

' ” ' 2=1 


( 8 . 10 ) 


as established in (8.3) and (8.4). With these assumptions 


rteV" 1 )- 


tt(u| q*) 


*(v\q k - 1 ) e ~ SS ^ l/2rJ 




? = e 


( 8 . 11 ) 


where the final step eliminates the potential for numerical | evaluation, as noted 
in Example 8.2. As illustrated in Figure 3.2, a candidate q * that yields 7r(u|^*) > 
7 t(u r/‘ 1 ) is equivalent to producing a smaller sum of squares error and this candi- 
date is accepted with probability one. If q* is such that ;r(u|g x ) < tt(u q h 1 ), and 
hence the sum of squares error is increased, we accept- i he candidate with probability 
u = r\ 

Properties of the proposal function and how they affect mixing arc illustrated 
in Figures 8-3 and 8,4- If the variance is too large, a large percentage of the candi- 
dates will be rejected since they will have smaller likelihoods^ and hence the chain 
will stagnate for long periods. The acceptance ratio will be high if the variance is 
small, but the algorithm will be slow to explore the parameter space. 

As illustrated in Figure 8,4 (a), if the posterior is highly anisotropic but the 
proposal distribution is isotropic, the efficiency with which the algorithm explores 
with respect to various components of the parameter vector will be highly norm in- 
form. The choices (8.8) for J(q* ) address these issues by scaling the variability 
of each parameter component in the manner depicted in Figure 8.4(b). The- goal is 
to achieve, to the degree possible, the efficiency of the univariate case. 




Figure 8 + 2 + (a) Likelihood and (b) sum of squares functions of Q, 
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Figure 8.3. (' / i- ■' i'raho n of candidal r j .s q* based on (a) narrow find (b) wv-rfe proposal 
functions Jiff' 1 ). Chains resulting from proposal functions that are. (c) too 
narrow and (d) too wide. 


The covariance 1 matrix V is estimated in the manner detailed in Section 7.3 


Specifically* we take 


- L 


V — \fV if to L S ) ^ (tfo L s )] > 

1 


&GL& 


n — p * 


^ ' \ v i ~~ fiUlu Lis}] ? 


( 8 . 12 ) 


t= 1 


where Xik[q) : 


d/.W\ 

tlQk 


as indicated in Table 


7.3 on page 146. 



(a) 



Figure 8,4, Anisotropic posterior 7r{q|tj) and (a) isotropic and (b) anisotropic 
proposal functions J iff |g A ‘ 1 ). 
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6.3.2 Sample-Based Error Variance 

The assumption that errors are ild and ■-- j A r (ti> cr 2 ) yields the likelihood function 
(8.10) and acceptance ratio (8-11) formulated in terms of a 2 - In most applications, 
however, c 2 is fixed but unknown. One solution is lo employ tlie estimate (8.12) 
for u 2 . Alternatively one can treat it as an additional random parameter whose 
density is sampled through realizations of the Markov chain. 

As illustrated in Example 4.69. the likelihood 


TThvj a 2 ) 




( 2 TT^T “ ) 


2 W2 


is in the inverse-gamma family, detailed in Definition 4.14. so the conjugate prior is 


7Tu(ff 2 ) « (<r 2 )- (o+l V /<T * 


( 8 . 13 ) 


The by per parameters n and /? can be treated as. design parameters. The resulting 
posterior density representation is 


G,u) oc (a ) 


I l + */3 55* W/v* 


so that 


( ' S S- 

a I — , ft 4- — — ^ 


An equivalent representation is 


( 814 ) 


r? 2 1 ('f ; , q) h a.v-gam m a 


n s H n j SS\ 


2 


2 


( 8 - 15 ) 


where n 9 •- - 2ot and (7^ = = As noted in [92], can be interpreted as representing 
the number of observations that provided the information encoded in the prior, 
whereas represents the mean squared error of the observations. In practice, 
one often takes to be small (e.g. , ■ 0-01 to 1), which is consistent with a 


noi linformat i ve prior . 

As noted in Example 4.69 and Definitions 4,13 and 4-1.4. random numbers 
from an inverse gamma distribution can be generated using the MAT LAB Statis- 
tics Toolbox command gamrnd.m and then exploiting the equivalence between the 
gamma and inverse gamma distributions. If g&mrnd.m is not available, one can 
use the inverse transform techniques of Section 4. Id to generate samples from the 
inverse gamma distribution. 

We summarize in Algorithm 8.5 the random walk Metropolis algorithm with 
sampling-based error variance and non in formative prior, Home implementations 
employ the OLS estimate from (8.12) or the previous estimate for & 2 . 

Whereas this is in the spirit of “empirical Bayes" inference as discussed in Sec- 


tion 4.8.2, it is noted in 35 that use of the present data to inform the prior can 
be problematic with small sample sizes and is at odds with the tenets of Bayesian 
a] talysis. 
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T1 le issues associated with subjectively specifying n A and <?* based on prior 
knowledge can be avoided by using the Jeffreys prior 

2v 1 

itoWjO = — - 

t7 

It is illustrated in Section 3.3.3 of [34] that this relation results from specification 
of a prior based on the Fisher information matrix for problems with mutually inde- 
pendent location and scale parameters q and o 2 . Alternatively it can be obtained 
from (8.13) in the limit o — * 0. i'i — ¥ 0 for the hyper parameters. 


Algorithm 8>5 (Random Walk Metropolis with Nonin formative Prior) < 



Remark 8.6. "We noted in (7,32) that for models in which the parameter scales 
vary by several orders of magnitude, one typically employs the scaled parameter 
q 6 — Q-f 8 in optimization routines. Here s is a vector whose elements are the 
magnitude of each parameter and ./ denotes componentwise division. The same 




8.3. Metropolis ard Metropolis-Kastings Algorithms 


165 


scaling can improve the efficiency of optimization routines used to determine q u , 
the conditioning of V, and the efficiency of Algorithm 8.5. Sped fir ally, one would 
employ the alternative steps: 

2. Determine q[' : = argmm^ i [ 1? i fiOls- x s)]~ an d q ' = q.'. x a. 

5. Construct covariance estimate V — fi* 'A’ 1 (r/J. x s)A’ 7 '(^ 1 . x s)| 1 and 

fi = chol(K). 

G. (b) Construct candidate q* — q\; 1 I Rz^, and set q " = q*. x s. 

fr (f) Additionally set q$ - q* or q* = q k ~ l . 

Note that the unsealed parameters q are employed in all model evaluations fi(q). 


8.3*3 Metro pal is-Hastirigs Algorithm 

The Metropolis-Hustings algorithm generalizes the Metropolis algorithm to include 
n< ^symmetric jumping or proposal functions J{q* }. For example, this includes 

Ca uchy dist r ibi itio ns 

J(q*\q- 1 )= ' 


^[1 I («*) 5 ] 


ai id x S ^ ) d tetri l >ut i ons 




Iii this case, candidates are accepted with probability a = iniiifl, rj, where the 
acceptance ratio is 


1 )7ro (?* _1 )J(q* Ig* -3 )’ 



For symmetric proposal functio us JUf q k ] ) = J(q k ] |g*)> (8. LG) reduces to (8.9). 
We focus primarily on the Metropolis algorithm with symmetric proposal functions 
and refer the reader to [92] for details regarding the Metropolis Hastings algorithm. 


Example 8.7. In Examples 7.15 and 8.2. we considered the estimation of the pa- 
rameter O for the spring model 

z I Cz + Kz = 0. 
z(Q) = 2 , z{ 0) = — C 

from a frequentist perspective and direct implementation of Bayes 5 relation (8,2). 
Here wc illustrate the random walk Metropolis Algorithm 8.5. Recall that for 
C 2 — 4 K < 0, the solution is 

z(t) = 2e~ ct/1 oos(VK-C 2 /4 • *). 
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\W : 1 1 r- i ■ 1 .. ■ i ■ ' : : r- 1 1 1 = ■ i i ii i ; 1 11 1 1 1 1 visi i ren i- -I r: ; i u 501 points ill the time interval 
[0*5] so that, ij't (Q) where h 0.01 i. Synthetic data is simulated with 

errors £* ™ A r {0, where <Jo = 0.1 is considered unknown when implementing the 
MCMC algorithm. 


Case i. To compare with Example 8.2, we first take K — 20.5 to be known and 
generate the displacement response with Cy = 1,5, Hence the random parameters 
are Q — |C, n-\. We consider chains of length M — 1 0 1 000, 

The chain, or marginal path, and kernel density estimate for C\ computed us- 
ing kde , m, arc plotted in Figure 8.5, The comparison between the density computed 
using the random walk Metropolis algorithm and the direct posterior evaluation de- 
tailed in Example 8.2 shows that the two are nearly identical. The MCMC kernel 
will converge in the sense of distributions as M is increased and quadrature er- 
rors when computing the normalisation constant arc decreased. We note that the 
marginal path for C provides a baseline for comparison when applying the algorithm 
for multiple model parameters-. 


Case ii. Second, we consider the estimation of densities for Q = [C\ K, &~] using 
synthetic data generated with Kr\ = 20.5, Cy = 1.5, and <7 q = 0.1. We consider first 
the choice J(q M I*/ - ’ 1 ) — N (y' k 1 , V) for the proposal distribution. Here 


G.0UU345 0.000268 

0.000268 0,007071 


is the covariance matrix given by 
sensi t.i v i ty relation s 


(8,12) which is constructed using the analytic 


L _ fr-ctfi 
-21 


Ct 


dC 

dy 

OK \/ 4 K - C- 


= sin ( s/K - cyl t)-i oos (yK - 0-/4 - f) 


*2 


e~ Cin Kill f V A’ - C 2 /4 - f ) . 



Figure 8.S. M argino.l path and density for the dumping jjajmnvlm C-. 
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Figure 8,6. Marginal paths and densities for' the damping parameter C and stiff- 
ness parameter K obtained with Jiff"' q k ~ y ) = N(q k ~ ] , V ), 


The marginal paths and densities are plotted in Figure 8,6. It is observed that 
2(.t c fts 0.04 so that » 0.4 x 10 whereas 2a k ^ 0,18 so that fff : 0.0081. 

These are close to the variance values in the covariance matrix V 7 which illustrates 


how it incorporates the general anisotropy exhibited by the posterior density. As 
r' --H i. ill' 1 1 ! fi v e ii ' m ' pellis in Figure S.fi exhibit essnithdlv ihe snitir degree ;■! 
mixing as the previous case <3 = [(7. a 2 ] plotted in Figure 8, 5(a), 


To contrast > we illustrate the results obtained with the isotropic proposal 


function J(q m |<jF 1 ) = N(q k 1 1 si) in Figure 8,7 for three choices of s. The mixing in 
Figure 8.7(a), which was obtained with s = 0 x 10” 4 n is reasonable but is not as rich 


as that obtained with the anisotropic proposal function 7(g*|g fc_1 ) : - A r (g* _1 , 17), 
Figure 8.7(b) illustrates the results obtained using a narrower proposal function 
constructed with s = I) x 10 ti . This yields substantial mixing but. poor exploration 




of the parameter space for K , Conversely, the choice 9 x 1 (1 yields poor mixing and 


chain stagnation since a large number of candidates are rejected. This illustrates 
the advantage of using the covariance matrix when it can be accurately constructed. 
Alternatively, in Section 8,6, we will discuss delayed rejection adaptive Metropolis 


methods that can be used to update the proposal distribution as candidates are 
accepted and the geometry of the posterior is deter mined. 
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Figure 8,7. Sample paths obtained with the proposal functions 1 ) = 

N[q k_1 (a) & = 0 x 10 -4 , (b) s = 9 x 10 -6 . and (c) .s = 9 x 10 _£ , 


3.4 Stationary Distribution and Convergence Criteria 


The random walk Metropolis algorithm provides a Markov chain whose state space 
is the set of admissible parameter values. The initial distribution is provided by an 
OLS fit to calibration data. However, it is important to note that the Markov chain 
is based on samples from the “wrong" 1 distribution in the sense that it is constructed 
using the proposal density rather than the sought after posterior density, Tu this 
section, we address two questions: (i) why should we expect the chain t,o have 
a stationary distribution that coincides with the posterior density, and (ii) what 
criteria indicate that the chain lias converged to this distribution? 

In Definition 4.62, we showed that the detailed balance condition tt i 
KkPk.k-i was a sufficient, (but not necessary) requirement for stationary. Since we 
want to show that the posterior density is the stationary distribution, we take 
ttje, = ir($ A '|u). Similarly, we consider 

pfc-i.jt = P(Xt = q k \X k ^ - ( f- 1 ), 

s i J 

which is the probability of transitioning from parameter q to q 
balance cond.it. ion in this context can thus be expressed as 


The detailed 
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Since pk i.k = F{ proposing q k )P (accepting </*), it follows from the definition 
of the proposal distribution J(i} k ') and acceptance probability a that 

Pk-i,k = J{q k \q k ' 1 )a{q k \q k x ) 

= ( I , 

Proin the relation 


n(q tt \v)J(q h 1 

V) 

1 i (q k " 1 




v mi n ( 1 , xfv) — \ n in (a: , u) = x min ( 1 , v / x) , 


which is established in Exercise 8.1, it follows that 

- n(q k V -1 ) min ( 

= tt(^|u) J{q k - 1 If/* 1 ) mill fl, • 


1 ’ 

^(V~ L |iQ-'(VlV~ L ) '| 


(8.17) 


Hence the detailed balance condition is satisfied for the Metropolis Hastings accep- 
tance relation and the posterior density is the stationary distribution. We note that 
the transition kerne! for the Markov chain can he defined as 


Pij - J(^|rj r }min 
Pa = 1 - 

jri 




i i 3 1 


The detailed balance result (8,17) establishes that if chains are run sufficiently 
long* they will produce samples from the posterior density. However, the question 
of how long chains must be run to converge to and adequately sample from the 
posterior is difficult and analytic convergence and stopping criteria arc hveking. It. 
is noted in [42] that the convergence, or bum-in, of MCMC algorithms can be 
falsified but, in general, not completely verified. 

Despite the lack of analytic convergence theory, there are various tests that 
can be used to establish confidence in simulations. We summarize only aspects of 
these tests and refer readers to [42, 92] for details regarding convergence or burn-in 
of MCMC simulations. 

The most direct method for assessing burn-in or convergence is to visually or 
statistically monitor the marginal paths associated with each parameter, as illus- 
trated in Figures 8,5, 8.G. and 8,7, The in it is I period during which means appear to 
transition is often termed t lie bum- in period, and these values are excluded when 
computing parameter or response densities since they are not sampled from the 
stationary or posterior distribution. 

The difficulty is that chains can appear stationary for a very large number of 
simulations and then change in the manner shown in Fi gure 8.8 for a parameter from 
a fcr&nsductivc 1 material model [116]. Because MCMC algorithms will determine 
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Figure 8.8. Shift, in the marginal path after 130,000 iterations due to a local 
minitnum, m the. sujrt, of squares; nee l l (i]„ 


global minima, if run sufficiently long, this can be due to initial sampling in a local 
irir.im'.im I n: -fi ui- I- 'i li: iy . 1 1 1 • • i In r '.villi n Itjwi'r resLihuil. E fewovor. 1 1 ■ i ■ ■ i;-; :io 
guarantee that this is a global minimum and the chain could transition again if 
another, lower, minimum is found. 

In some cases, the parameter density constructed using the burned- in MCMC 
eh ai n can he ee m pared wi t h th at d i ret d; 1 y eompn U ?d i isi ng B ayes 1 re] at i c m i . W lien ^as 
this is feasible only for a moderate number of parameters, which may require adap- 
tive sparse grid quadrature* it can be used to verify (or falsify) ( lie MCMC results. 

From a statistical perspective, the percentage of accepted points 1 termed the 
acceptance ratio , is often used to quantify whether or not the chain is adequately 
sampling from the posterior. Because the optimal acceptance ratio depends oil 
the geometry of the posterior, the range of reasonable scoop lance ratios is quite 
large; e.g,, values between 0.1 and 0.5 are often considered acceptable. The ac- 
ceptance ratio is often used to tune the proposal density J[g m 1 ) to improve 
mixing. For example, a small acceptance ratio can produce stagnation, as shown 
in Figures 8.3(d) and 8.7(c). This can fee addressed by decreasing the variance of 
affected parameters to narrow the proposal function. 

A second commonly employed statistical test is to check the autocorrelation 

E; =l (>;. - <i)(<Ii + k - <t) covjqi, <li + k ) 

E ^ i (*-?) 2 ™<®) 

between components in the chain that are k iterations apart. Because adjacent 
components are likely correlated due to tlic Markov property, this test can be used 
to establish that the chain is producing iid samples from the posterior. As detailed 
in [56], low autocorrelation often indicative of fast convergence. 

The reader is warned that care must he exercised when interpreting MCMC 
results reported in the literature. One commonly encounters parameter densities 
with no mention of the burn-in period or illustration of marginal sample paths. In 
such cases, it is difficult to verify that the reported density is truly indicative of 
the posterior. Second, if is not uncommon for authors employing compulation ally 


(8.18) 
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intensive codes to discuss very short burn-in periods. While Ihis may be the ease, 
it can also reflect the fact that the reported length reflects the largest, number of 
simulations that could be run with the code. 


8.5 Parameter Identifiability 


We noted in Definition 6,1 that the concept of parameter identifiability quantifies 
the uniqueness of the input-output map between par^ motors and responses. Hence 
parameter identifiability is a property of the. model and observations rather than of 
the inference or estimation procedure r For example! we illustrated in Example 3.2 
that we could not uniquely determine r/ = [m. c, k\ in the spring model (3.5) given 
displacement measurements z(t)> Instead we had to reformulate the model in terms 
of the parameters K = and C = — , As detailed in Chapter 6, one must 
reformulate the model or fix certain parameter values to address lack of parameter 
identifiability.. 


From the perspective of the likelihood, unidenti liability produces fiat regions 
in the likelihood function or multiple maxima having the same value* as illustrated 
in Fi guxe 3.9(a). From a Bayesian perspective, we noted in Section (i.3 that uniden- 
tifi ability can be manifested as posterior joint densities that are nearly single- valued 
for parameters having independent priors. Figure 3.9(b) illustrates the correlation 
exhibited by unidentifiable material parameters and in the model of [116]. 
The fact that multiple parameter values yield the same maximum likelihood value 
can also cause chains to jump in the manner shown in figure 3.8. For nonin for- 
mative priors, this can slow or stop the convergence of the chains to the posterior 
density. As detailed in Section 6,3, however, it is often difficult to differentiate 
between identifiable and unidentifiable parameters based solely on the width of 
joint densities, so this criterion should be interpreted as merely an indicator that 
parameter values may not be uniquely determined by the data. 

[ i l the random walk Metropolis algorithm, it follows from Property lb 7 that, 
lack of identifiability is manifested by a singular covariance matrix V constructed 




Figure 8,9, (a) Likelihood for an unidentifiable parameter set and (l>) correlation 
of u nidentifi a h le pa ramet ers ; see 116. 
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from thofconHi1.-i.vity relations 
lator model in Exercise 8T). 


This is illustrated for the simple harmonic oscil- 
Thift reinforces the relation between V and the Maher 


information matrix /' which quantifies the inform at ion content of an experiment. 

Whereas unidentifiable parameters cannot be uniquely determined using OLS 
estimators or maximum likelihood estimators* they can in some cases be determined 
using Bayesian estimators with informative priors for the unidentifiable parameters. 
Hence Bayesian inference can sometimes be successful for overpara meleri zed or 
ID i i dent! f i ab I ft m o del h if in formal i ve \ >j ■ i.ovs are avai I ab !e 


8.6 Delayed Rejection Adaptive Metropolis (DRAM) 

Whereas t.he choices (8.8) for the proposal distribution incorporate aspects of pa- 
rameter scaling and variability* they do not provide mechanisms to incorporate 
information learned about the posterior distribution as candidate parameters are 
accepted and the chain progresses. Such median isms are provided by various adap- 
tive Metropolis algorithms [11, 103, 208 > 255]* including the DRAM algorithm [102] * 
which we summarize here. 

We note that, because adaptive algorithms employ part of the chain history 
to update the proposal function, they are no longer Markovian processes, which 
requires that states depend only on the previous stale. Hence the convergence erb 
teria discussed in Section S.4 do not apply and alternative ergo di city properties 
must be established to guarantee convergence to the posterior density. Gt moral cri- 
teria, such as the diminishing adaptation and bounded convergence conditions , that 
adaptive methods must satisfy to establish convergence to a stationary distribution 
are provided in | L 1 . 103, 208 j. 


8.6.1 Adaptive Metropolis 

In principle, the adaptive Metropolis (AM) step employed in the DRAM algorithms 
is quite straightforward. During a nonadaptive period of length ko, chain values 
q Q ,q ] , . * . ,q k ~ l are computed using the initial covariance matrix Vq = V or V$ = D 
employed in the random walk Metropolis Algorithm 8,5, Once adaptation com- 
mences, the updated chain covariance matrix at, the k 1 * 1 step is taken to be 

Vff - SpCQV , q 1 , . . . „ q k 1 ) + £ ( p . (8 . 1 9) 


Here s f) is a design parameter that depends on the dimension p of the parameter 
space. As detailed in [102], a common choice is s p = 2,38" /p. The length ko 
of the adaptation interval is chosen to balance mixing with providing sufficient 
diversity in points to ensure a nonsingnlar covariance matrix in the initial stages of 
the chain progression. Shorter adaptation intervals typically produce better mixing 
and higher acceptance ratios since they accelerate the rate at which information 
regarding the posterior is incorporated. In practice , is often specified to be 
approximately 100. The term eI p , where £ > 0 and I p is the p-dimensional identify 
matrix, ensures that V*. is positive definite. One can often take £ = 0. 
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In theory, x ) can be computed using the empirical covariance 

formula 

cov(<A . . . = ^4 (X)9V) r - kq k (g k ) T 

where <j A: = ,J q* and q' are column vectors. However, this becomes increas- 

ingly inefficient as fc becomes large. Instead, one employs the recursive relation 



n+i - Vh + if 1 1 )T ~ + 1 )qk ^ )T + qk iqk )T 1 f 7 d ■ ( s - 20) 


In a similar manner, the sample mean can bo computed recursively as 




q k + 1 


£</ 


i-0 


k L 


fc- hi 
A: 

fc - hi 


i ^ , i 

■ — ) I 

fc ^ fc 4- l 

t= 1 1 




q l + 


fc H“ 1 


Q 


It is noted in [102] that the efficiency of the algorithm is improved if adaptation 
occurs at prescribed intervals and, in Algorithm 8,8, we employ intervals of length 
fco- The ergodicity of this adaptive algorithm is established in [103]. 


8-6.2 Delayed Rejection 

In the standard Metropolis algorithm, chain candidates q* are accepted with prob- 
ability 


ofaiy *) = min I 1, 


7T (f/^ ' | C 1 ) ./({/" |f/'- L ) 


= min | I, 




and, if rejected, the prior chain value is retained. The delayed rejection (DR) 
algorithm provides a median ism for constructing alternative candidates q" } if (f is 
rejected rather than initially retaining the previous value. 

As detailed in [102] t a second-stage candidate q* 2 is chosen using the proposal 
function 

WV’V)- N( ti k ~\-,*v k ), 


where Vfc, = RkR]. is the covariance matrix produced by the adaptive algorithm. 
The notation 1 ? f f ) indicates that we arc proposing q* 1 having started at 

q L ~ l and rejected q *. The software discussed in Remark 8.9 employs 73 ■ - hut 
other values are reasonable. Because 72 < 1 ? the second-stage proposal function is 
narrower than the original, which increases mixing. The probability of accepting 
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the second-stage candidate, having started at q" 1 and rejected q\ is 


- mini 1, 


= min 1., 


7r(^ 2 


|r 2 :4*)[1 -«(<?* k* 2 )] 


x(<i k M 1 

»)■*{<!' I f /' ] ) J 2(Q' 2 

'/■-VrO -<-*(<r\ 

q k ~ 

Ul 




( 8 . 21 ) 

due t o the symmetry of Jj, 

The form of cx% can be motivated as follows. It was noted in Section 8.4 that 
for the Metropolis Hastings algorithm, 

W-,.* = P{X k = q k |X,_, = q k ~ 1 ) 

= P (proposing q k )P (accepting c/ ) 

= J(q k \<I k - l Mq k \ q *-'). 

We now consider the case when we accept q k = <f 2 having rejected q " so that 

Pk-i.k = P (proposing <j K ) Projecting g K )F(proposing q h )F (accepting q k ) 

= n<r k*“ 1 )[1 - «(«* l^ 1 )1 jr a to* 1^- 1 , ^ Joraf 9* k*- 1 B . 

To satisfy the detailed balance condition : : i> we thus 

require 



v)J(<f l'/ A "' L )U -o(7‘l7 e - 1 )IWk 


fc-1 


jt| Jfc-1 


<f)& 2 {q q 


fci jt— i 








The condition (8.21) guarantees that the detailed balance condition is satisfied and 
that u < i . 

If q * 2 is rejected, a third-stage candidate and acceptance condition can be con- 
struct ed, and recurs i ve relations to cor tsi rue t j 1 h - s ta go candid 1 ates q and pr obal n 1- 
ities r'-k.; (r; - J ? . . . . q " “ 3 q “ , q ' 1 ) are provided in 102]. We employ only a second-stage 

candidate in Algorithm 8.8 since this is the default in the referenced software. 

In combination , DR and AM provide two different but complement ary mech- 
anisms to modify the proposal function* The AM provides feedback in the sense 
that information learned about the posterior through accepted chain candidates is 
used to update the proposal, via the chain covariance matrix, The DR is an open 
loop mechanism that alters the proposal function m a predetermined manner to 
improve mixing. The modifications from DK are temporary and have the goal of 
stimulating mixing, whereas the AM mechanism enacts permanent changes that re- 
flect information learned about the posterior. In Algorithm 8.8, w T e summarize the 
DRAM algorithm implemented in the referenced software with a second-stage rejec- 
tion mechanism. The reader is referred to [102] for details and other combinations 
of the DR and AM components. 
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Algorithm 8,8 (D^lnyed Rejection Adaptive Metropolis Algorithm with 
N on i n fo rmat ive Prior 102]). 

1, Set design parameters u.. mi cry . and number of chain iterates M 

2. Determine q' ! = arginin^'. 1 , [u» - /*($)] 2 

3. Set SS+ « EtiN-/t(/)] 2 

SS a 

4 . Compute init ial variance estimate: sf y — 77— 

5, Construct covariance estimate V = .V r (^'■ l )^V(// 1 j 

and R = chol(V ) 

6, For k = 1, . . , , M 

(a) Sample Zk ^ jV(U, L p ) 

(b) Construct candidate q” = q k ~ 1 | Rz^ 

(c) Sample u a ~U( 0,1) 

(d) Compute SS q - = 53" =l [wj - M'f)] 2 

(e) Compute 

o< 9 V" 1 ) = min 

() ) H W u < <*• 

Set q k = 7 * , SSq* = $$ r 

else 

Enter DR Algorithm 8.10 

endif 

(g) Update s| ■*- lnv-gamma(a va | , where 

= n,o(n # 4- n) , b va 4 = OAfn^rt 1 + A’S^) 

(li) if modffcjArp) = 1 

Update Vi = s p cqy ? . . , , q k ) 

else 

14 = J4_i 

endif 

{ t } Update Rk = c ho 1(14) 


Fie 1 1 iark 8. ft. A [ A T I , A B software for Algorithm 8.R of |102] is available at the web- 
sites https :// wiki Jielsinki.fi/display /inverse/ Adaptive+MCMC and http://liel.ios. 
fm i . fi/ "dainein a / mcmc / . 
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Algorithm £.10 (Delayed Rejection Component of DRAM with Nonin- 
formative Prior). 

1 . Set the design parameter 72 — - 

2. Sample zt c ■— N(Q, I p ) 

3. Construct second-stage candidate <f 2 = q k 1 T yiR-k-Zk 

4. Sample u a — U(0, l j 

5. Compute SSq.t - [14 - /t(ff K ’)] 2 

6. Compute 

ai(q* 3 |g* -1 , q*) using (8.2 1 ) 

7. If U a < Q, 

Set q k — g* 2 , SS (f k — SS ^-2 

eke 

Set q k = q k 1 , SS q k = SS q k - 1 

end if 


Remark 8.11. It was noted in Remark 8.0 that the performance of the algorithm 
can be significantly enhanced by using scaled parameters g ft = qjs if physical 
parameter values vary significantly. This can be implemented with the modified 
steps; 

2. Determine q" - argmin^ } [u* - fi[q s > x a)} 2 and q" ■ gjj. x & 

5, Construct covariance estimate V r = ^[A' 7 {7 1 ; 1 r x x)X 7 iq ' . x s)\ and 
R - chol(P) 

0. (b) Const ruct candidate q* = q k 1 — Rz^, arid set q’ — 7* . x 

6. (f) Additionally set q k = 7*, 

DR 3. Construct second-stage candidate q*' 2 — q k 1 P y-tlU. ■ und set 
7*" = g* 2 . x s. 

DR 7. Additionally set q k — q*' 2 or q k = gj 1 . 


Example 8.12. In Example 7.10, we used frequentist analysis to construct sam- 
pling ; list rilmti 011s for tlie parameters q = |d\ h in the model 
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for steady state heat conduction in an uninsulated aluminum rod of length L with 
a source heat, flux <3> at x f)_ As detailed in Example 3.5. J\ is t.he steady state 
temperature, ft is a convective heat transfer coefficient, and k = 2/37 W-cm _1 ^C -1 
is the thermal conductivity coefficient for aluminum. 

Here we construct densities for and ft using both the random walk Metropo- 
lis Algorithm 8.5 and the delayed rejection adaptive Metropolis Algorithm 8.8. The 
residual plot in Figure 7.3(b) motivates the assumption that errors arc iid and unbi- 
ased. We Further assume that they are normally distributed with fixed but unknown 
variance er^. With these assumptions , we can employ the likelihood relation (8.10). 
We employ the covariance estimate 

2.1034 x Itr 2 
V ° ~ [ — 2.0286 x 10“ 6 

of (7.4(5) as the initial proposal function. 

We first employ the nonadaptive algorithm to provide a baseline to illustrate 
advantages of the DRAM algorithm. The marginal paths, obtained using the ran- 
dom walk Metropolis algorithm with A/ 1 (J 1 iuicl .i / 10 & M onto Carlo iterations, 

are plotted in Figure 8,10, Because VFi incorporates the anisotropy due to the dif- 
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Figure 8.10. Sample paths obtained using the nonadaptive random walk Metropolis 
Algorithm 8.5 with (a) M = 10' 1 and (b) arid M = 1 O ' iterations. 



17S 


Chapter 3. Bayesian Techniques for Parameter Estimation 


ft: ring variances of $ and h, the two chains exhibit similar mixing. However, the 
acceptance ratio is 0,056. win oil is smaller than the targeted range 0.1 — 0.5. and 
the plots obtained with M = 10 1 iterations exhibit regions where the chains briefly 
:-l ■=!;.: 1 1 : 1 1 1‘. \s ill us" " : i : : 1 in 1 'igl U' ■ S 3 . 1 1 ! : - IT : li ; A ■: ■?- I } I 1 1 : i i'-: ■ VV -V T- :]i . 1 I ' 11 : - 

tions should improve mixing. We note that the stagnation regions are not visible 
in the plot of M = IU’ n iterates, thus motivating the necessity of checking the ac- 


ceptance ratio, which remains 0.056, The stalioiiarity of the chains indicates that 


they have burned- in and are sampling from the posterior density. 

The marginal paths and kernel density estimates constructed using M ■ - 10 4 


DRAM iterates are plotted in Figure 8,11, A comparison with the chains plotted 
in Figure 8.10(a) and (b) illustrates that the algorithm is achieving the goal of 
enhancing mixing and accelerating burn-in. The chain covariance matrix is 


2.1101 x l.U“ 2 —2.321 1 x lU“ fi 

-2,3211 x 1Q“ 6 2,3869 x 10“ 10 


which is very close to the original covariance matrix Vo in (8.23) which was provided 


by OLS theory. This demonstrates that for lliis problem, the narrowing of the 
proposal function in the DR step has a more substantial impact than the proposal 
modifications in the AM step. The algorithm provides the estimate a~ = 0.0678 for 
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Figure 8,11, Marginal paths and densities for the. parameters T and h obtained 
With the. DRAM algorithm... 
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Figure 8.12. Joint smnpir -points for and h. 


the error variance, so u = 0.2604. The estimated standard deviations for $ and h 
arc <T(i> = 0,1552 and <rj 4 ~ 1.5450 x 10 It is observed in the marginal density plots 
in Figure 8. 1 I and joint density plotted in figure 8. 12 that, two standard deviations 
represent approximately 95% of the density. 

The frequent! st. analysis in Example 7.16 yielded the standard deviations & = 
0.2504, — 0.1450, and <7* = 1.4482 x 10 We first, note that the values of a are 

within 4% despite the fact that <r = 0.2504 is an estimate, whereas a = 0.2604 is the 
mean of a density sampled through realizations of the Markov chain. Furthermore, 
we observe that, ill- 1 values of <7^ and rr^ obtained through Bayesian analysis are 
wi tl li 1 1 5% o f the free juci it i st ml ues for tl le samp lii lg di stril .n ition. 

From a mathematical perspective, the similarity of the sampling distribution 
and paranucic didrilmdun cmi I ■« ■ allrihuled to ibr i:= irma ji id : : n ■ : j i l ; i : : ■ I 

parameter densities. However,, care must be exercised when interpreting this result 
since the sampling distribution is for the parameter estimator rather than the pa- 
rameters. As detailed in Chapter 7, it thus quantifies uncertainty pertaining to the 
estimation procedure rather than uncertainty associated with the parameters. 


Example 8.1fl. To Illustrate the performance of the delayed rejection adaptive 
Metropolis algorithm for a system of coupled ODFs with multiple responses we 
employ the model 


Tl =Ai - <fiTi - (l-c)feivri, 

T 2 = As - d 2 n - (1 - fe)k 2 VT 2 , 

T{ = (l-s)k l VT l - 6Tf - mi ET{, 
f: = (1 - fe)k 2 VT 2 - 5T; - moE-T-l, 

V = N r S(T* + T‘) - cV - f{l - eJftfciTi + (1 - fs)p 2 k 2 T 2 ]V 


(8,24) 


E = \ F + 


d E (T{- | r 2 *) 


W > r 3 ") „ 

T* + T* + K b Ty | 7 ;; + K d 


E - He E. 
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Figure S.13. Chains for Q - [b^, 5, dj, fco , Aj, A ,', 


developed in [2, 3] to provide a framework to investigate control strategies for HIV. 
As detailed in Example 3,3 n Ti and T* represent the populations of uninfected and 
infected T-lymphocyt.es, 7n and Tf are corresponding macrophage populations, and 
V", E denote the populations of free virus and immune effector cells. 

To construct synthetic data for all six states, we arid noise to model solutions 
computed using the parameter values reported in 3 . We note that clinical data 
comprised of the total number T\ + T[ of T-lymphocytes and viral load V can be 
found in [3]. 

For this example, we used the DRAM algorithm to construct chains and den- 
sities for the parameters Q = 5, tfi, ffb] with the remaining parameters 

fixed at the values reported in [3]. After a burn-in period of 5003 iterates, the 
parameter chains, densities, and joint sample points obtained using 15,000 DRAM 
iterates are plotted in Figures 8.13 8.15, We note that this is a relatively short 
burn-in period despite the fact that parameter values vary over eight orders of 
magnitude. It is observed from the pairwise joint- sample plots in Figure 8.15 that 
k'> and ti are clearly correlated, as are Ai and d\ r The fact that the parameters 
are not mutually independent proves important when we revisit 1 his problem in 
Example 9.14, where we discuss propagation of uncertainty in models. 
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8.7 DiffcRcntial Evolution Adaptive Metropolis 
(DREAM) 

It was illustrated in Section 8.3 that the scale and orientation of the proposal func- 
tion critically affects the mixing and exploration of chains in standard random walk 
implementations of MCMG algorithms. The DRAM algorithm of Section 8.6 sig- 
nificantly improves performance through two mechanisms: adaptation updates the 
chain covariance matrix as information is obtained about the posterior density, and 
delayed rejection alters the proposal function in a predefined manner to improve 
mixing. For many models, that is sufficiently efficient for constructing parameter 
densities that can subsequently be employed for quantifying uncertainties in model 
responses or Qoh 

However, there are various regimes for which DRAM algorithms are often 
not efficient. These include problems in which posterior densities are multimodal, 
are highly complex, or have heavy tails. For these cases, the single DRAM chain 
will be slow to traverse the posterior, which can significantly diminish its efficiency. 
Moreover, the computational overhead associated with complex models — such as the 
weather t climate, hydrology, and nuclear reactor models discussed in Chapter 1 
often preclude the construction of burned-in single chains, whereas one can often 
compute shorter parallel chains using massively parallel architectures. 

With tl le framework discussed in Section 8.0. these issues have motivated the 
development of parallel chain versions of the adaptive Metropolis algorithms. In 
the interchain adaptation approach detailed in [230], independent parallel chains, 
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with early rejection mechanisms, are used to adapt the proposal limei ion which in 
turn influences the mixing and exploration of future chain elements. This approach 
is highly parallelizable, improves the convergence of individual chains, and has been 
applied to climate models. 

Differential evolution Markov chain (DK-MC) methods provide an alternative 
1 hat r an be mor- ■ ■ lETti.i. T; pi- •! : J. i:js with mud i i : j ■ ■ lal . n Jlomvv taiU'd ■ b ■ : i i " L • - 
[246]. This approach can be summarized as follows. For p parameters, N chains 
gr* s i ■ h.. A', are simultaneously run in parallel and the present population is 

stored in an *Y x p matrix X. Note that here the subscript designates the i ii] chain 
rather tl i a n the i ' h parameter com p o ner it . In tl a e o i i gii ml algor i ll lrn , ea n d idates q? 
were constructed by randomly choosing two chains from X , without replacement. 

I 

and adding the weighted difference to qy . that is, 


tf = Qi + T (<?* 1 - ‘) + e , i\ ^ hi 1 i, 


k- i 


for i - ■ L , . . j A r . Typically, one takes e as realizations of E N(fl i blp) i where 6 is 
chosen smaller than the variance of the posterior. An optimal choice for the weight 
is 7 - ^ 2.38/v^p. During implementation, one often specifies j - • 1 at every 10 th 
generation to permit direct jumping between inodes. Candidates are then accepted 
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1.83 


with probability a ■■ ■ mm(l,r), where r is the acceptance ratio given by ( 8 . 11 ) or 

(8J6). 

The DE-MC algorithm differs from DRAM in the sense that it generates can- 
did ates based on current chain information stored in X rather than the covariance 


matrix V or chain covariance defined in (8-19) which constitute the proposal 
function. The construction of q* based on random members of the population fa- 
cilitates the exploration of multimodal and heavy tailed posterior distributions and 
provides a mechanism for determining the appropriate scale, shape, and orientation 
of the proposal function. Further, the parallel chains in DE-MC algorithms learn 
from each other as compared with the DRAM implementation, where independent 
chains are used to adapt the proposal function. Theory establishing that Markov 
chains constructed in ihis manner have a unique stationary distribution and de- 
lails regarding the implementation and performance of the DE-MC algorithm arc 
provided in [246]. 

Further improvements in efficiency can be realized for many applications when 
similar evolution algorithms are combined with self-adaptive, randomised subspace 
sampling. This is the basis for the DiffeRential Evolution Adaptive Metropolis 
(DREAM) algorithm detailed in [261]. The candidates in this case are randomly 
generated using the algorithm 


r j 

<h = + {/, + /b(V) 

-j=i 



where , r j (/), ia(n) £ { 1 , . . . , A"} satisfy *i (j) ^ * 2 (^) 7 ^ i for j f n= 1 , . , . ,iV. Hero S 
denotes the number of randomly sampled pairs and p* is the number of parameters 
that are jointly updated. Finally, / and e are realizations of uniform and normal 
random p- vectors; i.e.. F ^ Zb , ( — 6 , 6 ) and B ^ JV(0 ; F l jf ). where 6 and h" are small 
compared to the width of the posterior density. 

For large parameter dimensions p. DREAM employs a random subspace sam- 
pling strategy that can decrease 3/ from its original values of p. This reduces the 
number of required parallel chains A r so that, in theory, DREAM can run with 
N < p as compared with 'V ■ - 2 p required for DE-MC 

Convergence analysis and case studies for DREAM, DREAM^, and MT- 
DREAM(gg) are provided in [144, 259, 261], Specifically, [261] illustrates appli- 
cations where DREAM exhibits superior performance t*. DRAM and DE-MC. For 
moderate- to high- dimensional problems with computationally intensive codes, 
DREAM shares the advantage of the parallel DRAM algorithms since both can 
be implemented on massively parallel architectures. For example, the use of MT- 
D RE AM to perform Bayesian inference for 241 parameters in a hydrologic model 
is illustrated in 144] - 

Because DREAM is newer than DRAM, there are fewer available MATLAB 
toolboxes that are configured for general usage. However, that will certainly change 
in the next, few years and readers arc advised to incorporate both in their libraries 
of Bayesian parameter estimation routines for large, nonlinear engineering and sci- 
entific problems. 
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8.8 Notes and References 

MCMC methods have proven highly successful for numerous applications quantified 
by data-based or statistical models. This is due in part: to improved computational 
resources and the success of techniques such as the use of conjugate priors and 
Gibbs samplers, which permit parameter by parameter sampling when conditional 
posterior distributions can be reasonably approximated. However, the application 
of these techniques to engineering, science, and mathematical models was initially 
hampered by the following issues: 

* the complexity and highly nonlinear dependence on parameters in models 
prohibited efficient use of conjugate priors and Gibbs samplers; 

* appropriate statistical models and likelihood functions were < liffi cult to for- 
mulate: 

■ static proposal functions were ineffective for exploring high-dimensional and 
complex posterior densities; 

* the computational time required for codes associated with phenomena such 
as nonlinear, coupled, or high-dimensional PDEs prohibited burn-in. 


These issues have been addressed m part by the development of algorithms 
such as DRAM and DREAM that have the adaptive capabilities to explore complex 
and multimodal posterior distributions and are amenable to implementation on 
massively parallel architectures. As a result, these algorithms are presently being 
employed for PDE models such as those employed for climate simulations. It is 
anticipated that their use will grow substantially as toolboxes evolve and the success 
of these and emerging algorithms is established. 

We have focused primarily on Metropolis algorithms as a prelude for dis- 
cussing DRAM and DREAM, and hence we have neglected several techniques t 
such as Gibbs samplers, which have proven highly successful in other contexts. 


The reader is referred to [92] for details regarding Gibbs samplers and the associ- 
ated softwar e package BUGS (Bayesian inference Using Gibbs Sampling) which can 
be implemented from within the statistical package R but is not. yet available in 
MATLAB. Details about importance sampling can also be found in this refer- 
ence. Sequential Monte Carlo (SMC) methods [75], which are also known as 
p article filters, can provide a more accurate alternative to extended or nnscented 
Kalman filters for data assimilation when the number of samples is sufficiently 
large [79], We refer the reader to [50, 240 for general overviews of Bayesian anal- 
ysis and [15, 128 ? 244] for discussion regarding the use of Bayesian inference for 
parameter estimation. The interpretation of Tikhonov regularization in a Bayesian 
context is detailed in 127, 128], 


8.9 Exercises 

Exercise 8.1. By considering the cases x < fjjp r .ui-i ■ r. nUDli i In- 
relation 

v min ( 1 , x fv ) = min , u) = x min ( 1 , v/ x ) . 
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Exercise 8*2. Consider the steady state heat model of Example 8,12 and llio tem- 
perature data v compiled in Table 3.2. Use DR. AM to compute chains and marginal 
densities for the parameters Q = [<£, ft]. You should be able to reproduce the re- 
sults in Example 8.12. Now compute the posterior density directly using 

Bayes’ relation (8-2). You can approximate the integral using tensor ed 1-D quadra- 
ture relations, as detailed in Chapter 11. Compare your posterior density with the 
joint density obtained using DRAM. Now numerically integrate 7r(<j|v) to construct 
marginal densit ies for and h and compare to those constructed using DRAM. 

Exercise 8,3. Repeat the computations of Example 8-12 using DRAM with the de- 
fault proposal function J{<f 1 ) — N(q h = 1 , D) rather than the choice J(q* q { ' } 

N(q } ‘ ’jK) used to obtain the reported results. Plot the first 200 iterates to show 
the initial bum- in period- Compare your final chain covariance matrix to that 
reported in the example* 

Exercise 8*4, Verify the recursive relation (8.20). 

Exercise 8.5. Show that V is singular if we try to estimate */ = [m. c, k for the 
spring model 

d 2 z dz 

m dP + c Tt ' kz = 

,(0) . 2 . |(0> _ -C. 

Exercise 8,6. Here we are going to use DRAM to construct densities for parameters 
in the ITTV model (8,24) detailed in Example 3.3 and illustrated in Example 8,13, 
Synthetic data is provided in the file hiv-data which can he downloaded from the 
web si te htt j > : / / w ww . Siam .or g / bo oks / cs 1 2 . The seven colum ns respect! vely cont ain 
the time and values for the six states Xi, 'It, T' . TV , V , and E measured every 5 days 
for 200 days. We are going to construct chains and densities for the parameters 
Q = [di, k-2, <U>e]- 

You should start by writing a MATLAB code that uses fminsearch to opti- 
mize Q based on this data. Von can use the initial condit ions (3.]G) and remaining 
parameter values in Table 3,1. 

Now use DRAM to compute densities for rii, and bg. You should mon- 
itor your chains to ensure that they have burned- in or converged. I dot the chains, 
marginal densities, and pairwise scatterplots. We will revisit this problem in Exer- 
cise 9,7, 





Chapter 9 

Uncertainty Propagation 
in Models 


Prediction is vert; difficult , especially if it/ $ about the future f 77 Niels Bohr 


We natal in Chapter I that predictive estimation is comprised of three com- 
ponents: model calibration! model prediction, Anri estimation of the validation do- 
main. In the first, measured data is used to quantify input uncertainties associated 
with parameters, initial or boundary conditions! and forcing functions. For certain 
applications, input uncertainties can be directly quantified from experiments. For 
phenomenological models with nonphysical parameters, however, experimental in- 
put, distributions are typically unavailable and densities must be estimated using 
the Bayesian techniques presented in Chapter 8. This is often referred to as inverse 
m ice rtainti / quem tiftcaiion. 

For model prediction, one computes the mean and statistics, prediction in- 
tervals, or a pdf, for a model response or quantity of interest (Qol). As detailed 
in Chapter 1 : Qol include average rainfall amounts for a specified area, expected 
temperature increase over a future time period, or bounds an void fraction distri- 
butions that guarantee specified performance levels and safety margins in a nuclear 
reactor. To construct uncertainty bounds far the Qol, one must propagate input 


uncertainties through the model 


while accounting for measurement, errors. 


This is 


the topic of this chapter and Chapter 10. In Section 0.4, we illustrate the differ- 
ence between con lie h slice or credible intervals,, which quantify t lie accuracy of model 
fits, and prediction intervals which incorporate both propagated uncertainties and 


u leasureir lent errors , 


Techniques to propagate uncertainties through models include the following. 


* Direct. Eval nation for Linearly Parameterized Models: Whereas the models 
discussed in Chapters 2 and 3 typically exhibit nonlinear parameter depen- 
dencies, linear parameterized models arise in applications such as ini age pro- 
cessing and X-ray tomography We illustrate linear models in Section 9.1 since 
response uncertainties can be computed explicitly in this case. 

* Sampling Methods: These methods, including Monte Carlo techniques, are 
commonly employed to propagate uncertainties in no nil nearly problems. As 


1 07 
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noted in Section 0.2, response and Qol uncertainties can, in certain cases, be 
coi j stn icted with no add itional com | >i i tat 5 ( >n a! ct :-si i i' the Bayes] an ted m i qt xes 
of Chapter 8 are used to determine parameter densities. This technique has 
the advantage of being independent from the number of parameters but the 
disadvantage that the method converges at a rate of —h= where M is the 

V 

number of simulations. This is due to the solely statistical nature of the 
method and the fact that it does not exploit regularity associated with the 
pa rainy tor space. For pivbie'm# unth correlated ■fNiinmp.t.e-rs or sufficiently 
pQT&ftictcv dimensions 7 however ? this w&Q?y be the best choice. 

■* Pen ur bat ion Methods’ We illustrate in Section 9-3 methods based on trun- 
cated Taylor expansions of the model response or Qol evaluated at the pa- 
rameter mean. To facilitate implementation > first- or second-order expansions 
are typically employed, which limits the technique’s accuracy for applications 
where the map from inputs to responses is highly nonlinear. 

* Spectral Representations: The objective of stochastic Galerkin and colloca- 
tion methods is to represent uncertain inputs in a manner that facilitates the 
evaluation of moments and distributions for Qol. This is achieved by employ- 
ing spectral expansions that exploit the smoothness, often associated with 
high-dimensional parameter spaces, to improve the convergence of techniques 
used to specify realizat ions used to specify Qol. Due to its breadth, we devote 
Chapter 10 to this method. 

As in Chapters 7 and 8, we consider statistical models of the form 

Tf = A(Q) + Si . i= L,..,n, (9.1) 

where T t and e* are random v - vectors representing observations and errors and 
fi(Q) £ M. 1 ' is the model response which is a random variable due to the random 
parameters Q. We assume that £$ are iid and unbiased. We denote realizations of 
the model response and errors by f(q) and e\ In general > the model evaluations 
fiiq) will depend not only on the p parameters but also on the independent and 
dependent variables, as detailed in (7.9). We suppress the latter dependencies in 
the notation since sensitivity analysis and uncertainty quantification are governed 
by parameter dependence. 

For uncertainty propagation, we assume that, at a mini menu mean parameter 
values q?. variances var(Q*}. and covariances cov(Qt,Qj) have been obtained either 
experimentally or through statistical analysis. If the Bayesian techniques of Chap- 
ter 8 have been employed for inverse uncertainty quantification, one additionally 
has posterior densities ti~( qr | v ) for the parameters. 

9.1 Direct Evaluation for Linear Models 

Linearly parameterized models arise in applications including X-ray tomography, 
inverse heat models, as illustrated in Exercise image processing, and acoustic 
pin -no niei la quai it i Red 1 >y con vt >lutic >n mo dels [27 , I 76 , 25 C | , F'i ir tl 1 t ti nore , 1 1 id r 
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analysis motivates theory required for no uli nearly parameterized models. We il- 
lustrate? here uncertainty propagation for a linear model where one can construct 
explicit mean and variance relations. 

Consider the linear multiple regression model 

j j 

I ■ Ei . i = 1, .... 5 Tlj (9.2) 

J=2 

which has the matrix- vector representation 


T — XQ + £, 


(9.3) 


where Q = [Qi , . . , . Q t/ 1 ; are random variables with means q = qi,,,.,q p , The 
design matrix is 

1 aq 2 aris 


X = 


1 


(9.4) 


1 ' 1 J 

The modeled response is thus f(Q) = Xq. Wo note that for polynomial regression, 
the independent variables may represent, powers — e.g., xn (;r/ 12 ) and xi% 
(j:/ 12) 2 in Example 9.3 in which case (9.3) exhibits a nonlinear dependence on the 
independent variables but a linear dependence on parameters. Finally, we assume 
that the errors are iid and unbiased. 

It follows from (J.12) and ( 1 . 10 ) ihat 


E [/f(3)] = +^T'VCf> 

3=2 

var[/i(g)] = var(Qx) + [ar? var(Q rf ) + 2x i] co\(Q 1 . t?^)] (9-5) 

P J - 3 

+ 2 ^ XijX ik cov{Q.j ,Q k ) 

j<k 

for i — L . „ . , h. For the linear model (9.3). the mean and variance of f(Q) — XQ 
can thus be computed directly using mean and covariance values of the parameters. 

The development of analogous mean and variance relations for linear models 
T = XQ | £ with general design matrices X is explored in Exercise 9.1. 


Remark 9.1- The relations (9-5) can be used to construct confidence or credible 
intervals for the linear model f(Q) — XQ. As detailed in Section 9.4. diis quantifies 
the accuracy of the model fit but does not indicate the uncertainty associated with 
subsequent predictions. Prediction intervals are constructed by noting that 

E(T,) - E[fi(Q)\, 

var(Tj) - var[/j(g)] + varfo) 

based on the assumption that the model response and measurement errors are 
mutually independent and that errors are unbiased. 
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Remark ft. 2. Consider the linear model f(Q) - - XQ, whore ... , Q p are mutually 
independent and normally distributed. Qj ^ vV(^ T nf ), so that V rliagf*?^ , , , 5 a p ). 
For y = A r <j\ it follows from Theorem 4.21 that 

f(Q)~ N(y,X r VX). (9.0) 

Example 9,3, Consider the height- weight data compiled in Table 7.1 of Exam- 
ple 7.0 with the regression model 

f(Q) = Qi + Q 2 (z/ 12) + Q 3 (a;/l2) 3 . (9.7) 


The Bayesian analysis of Chapter 8 yields the parameter and covariance estimates 
q = [91,93, fe] T = [260.65, -87.74, 11.92] r , 



var^j) 

co v(Q 1 .Q 1t } 

CO\'(Q 1 , Q 3 ) 


778.35 

—287,93 

26.51 

V' = 

«>v{y 2 ,Qi) 

var (Q a ) 

00 v(q 2t y 3 ) 

= 

-287 93 

100.61 

-9,82 


COvfQj,^) 

cov(£? 3 ,Q 2 ) 

var (Q ;J ) 


25.51 

-9,82 

0.91 


The relations (9.5) then yield 
E[/(Q)] - qi + (x/i2)r7 2 + (s/12) 2 ©, 

var[/(Q)) = var(Qj) I (x/12) 2 var(Q 2 } + (x/12) 4 var(%) (9.8) 

+ 2(*/12)oov(q 1 , q 2 ) + 2{xf l 2) 2 cov(q 1i q 3 ) + 2(x/12)’Vjv(q 2 ,q 3 ), 

which can be used to predict the expected weights and uncertainties for specified 

values of the independent height variable j.\ 

To illustrate, the relations (9,8) were used to predict the expected weights and 
2d credible intervals for input heights ranging from 50 to SO inches. As illustrated 
in l'i gure 9.1, the 2a credible intervals are very tight in the region 58 inches to 



Figure ft, I, (a) Model fit to data with 2 a credible interval and (b) expanded per- 
spective in the fitted re.yton. 
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1.91 


72 inches used to estimate model parameters. As expected, the credible intervals 
grow significantly when extrapolating for he ig hts outside the calibration region ► 
This will be further discussed in Section 9,4.2 in the context of prediction intervals, 


9.2 Sampling Methods 


For applications where distributions for measurement errors and input uncertain- 
ties e.g., clue to uncertain parameters, initial or boundary conditions, or forcing 
functions — -have been determined either expei i mentally or using the Bayesian tech- 
niques of Chapter 8, sampling methods can often be used to construct distributions 
for responses or QoL In principle, this approach is very intuitive and is implemented 
by randomly sampling from the measurement error and joint input distributions to 
construct an ensemble of responses from which response statistics and prediction 
intervals can be computed. The technique has the additional advantage that its 
effic iency is essentially independent of the number of parameters since one can si- 
multaneously sample from each parameter distribution,. 

The disadvantage of sampling methods is that they typically exhibit relatively 
slow convergence rates, and thus a large number of response realizations are required 
to construct a reasonable statistical ensemble. For example, Monte Carlo techniques 
have an convergence rate where M is the number of realizations. Hence 

the number of KimuJatious must be increased by a factor of 100 to gain an addi- 
tional place of accuracy. In this sense, Monte Carlo methods are decelerating since 
a factor of four increase in computational effort is required to obtain a factor of two 
improvement in accuracy. For models that require hours to days for a single evalua- 
tion, Monte Carlo sampling will be infeasible unless realizations can be constructed 
in parallel. Whereas Latin hypercube and quasi -Monte Carlo sampling methods 
exhibit higher convergence rates, they too are often infeasible for computationally 
complex problems such as applications involving coupled physics or multicomponent 
biological systems. Details regarding stochastic quadrature methods arc provided 
in Section. 11,1.1. 


Remark ShT The success of sampling methods relies on reasonable representation 
of the joint density For problems in which the joint distribution is assumed 

to be normal or uniform, sampling from //.d//) is highly efficient. For more gen- 
eral distributions of mutually independent parameters, one can sample from the 
marginal distributions and employ (5.2) to construct f\_. (7?) . 'The difficulty arises 
for correlated, non- Gaussian, nonmiiform parameters, which is typically the case 
for physical models. If marginal distributions and a correlation matrix can be con- 
structed, it was noted in Section 5.2 that one can employ the Nataf transformation 
in combination with a Cholesky decomposition to obtain a representation based on 
mutually independent Gaussian random variables. 

If this information is not available, one can instead use the methods of Chap- 
ter 8 to construct prediction intervals. If the prediction domain lies within the 
■ -.= 1 1 i I : v = . i i ... 1 1 ■ h >in m 1 1 i ;-i ■: l Mr Ibovsbin ; ^r.no-l '-v [jmv inn. rr-spniiso ensembles 
can in some cases be computed using the model solutions employed to construct 
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the likelihoods central to the Metropolis algorithms. For these cases, input and 
response distributions have the same computational cost.. If memory limitations 
prohibit storage of all ensembles, one can instead sample from the chain indices to 
construct response densities and prediction intervals. This is illustrated in Exam- 
ple 9-14 ft >r an ODE system with correlated non- Gaussian parameters, The fact that 
this technique applies to correlated parameter sets constitutes an advantage since the 
spec tral j'epresenlaiion# discussed in Ohapter 11 rapine. -independent parameters or 
^presentations of the joint density p.-~, (q) which in turn rely on the accuracy and 
numerical feasibility of the transformation techniques discussed in Section 5.2. 


9.3 Perturbation Methods 

For large or complex nonlinear ly par ameterised models, sampling-based uncertainty 
propagation is often computationally infeasible. In some cases, truncation of multi- 
dimensional Taylor expansions for f(Q) yields approximate uncertainty criteria 
whose accuracy is dictated by the order of the Taylor expansion [50]. 

For this development, we assume that the density for each component Q ; is 
symmetric about a nominal value fj* and consider the representation 

q = ij + Sq = [ft 4- Sq I , - . . , fj p + f>g p ] T » (9.9) 

where is the vector of perturbations or uncertainties about q. One typically 
takes q to be the expected parameter values and SQ to be one standard deviation 
for each of the parameters, but other choices are possible. It is assumed that realized 
values for q and dQ have been determined either experimentally or using the model 
calibration techniques discussed in Chapters 7 and 8, 


Single Response 

Consider first the case of a. single response so that n = v = I. The l V 1 ,h -order 
Taylor expansion of f{Q) about nominal values q with perturbations &Q is 


J (3) — /(<Ji 4- fiQu • - • - + f>Q v ) 


Of 


m + £ 
i 

4 f VT h- 


0\f 


f 1 sr ~ ^ 

^ + 2 — QQ.Q. 

jV 


&Qu $Q h 


d*f 




6q u ■ ■ ■ 5q 


^ iV 




We subsequently consider the first-order (linear) expansion 

p 


f{Q) = V + T SjSQj 


i=l 


(9.10) 


( 0 - 11 ) 


which is reindexed since we are assuming a single response. Hero y = f{q) and 
&f 

dQ, 


Si = is the sensitivity ef the response to the i th parameter evaluated at q, 
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For a random variable Q = [Q l , . . . , Q ;>m with joint pdf pn (q) n the parameter 
in e&ns. variflJTJCSHj and covariances can he expressed as 



varfQj = / (</, - (?)<*?, 

JffiP Jrp 


™v(Q fl Q ) = / (ft ~ 


ir- 


RJ 1 


where tift = ft — ft. It. follows from (9.11) that 


E[/(Q)] = y f Pc>(<])dq h V 5, / (ft - = y- (9.13) 

iRp “ .JRp 

The first integral is unity since p Q (q) is a density, whereas the symmetry of the 
second integrand yields an integral of zero. The variance of f(Q) is 


var f(Q'i = E[(/(q) yf ] 



P 


Y. ] pQ{<l)dq 


a-1 


F P 


(6^) 2 ^(?)dg + £ £ ^ /_ (<% } (Sq i )p Q (i q)dg ( 914 ) 


i= 1 

P 




v — L J= I 


4 Re 


P P 


2j.s-vnr(Q ; ) + V V Sj.SjfX'jvlQj, Q.j), 


i- L 


e"— 1 j-i 

j/i 


We note that (9.1 4) can be written as the matrix relation 


var [/(<>)] = S r VS t (9.15) 

where l- is the covariance matrix for Q and S ! = [aj , , . . > | is the row vector of 

response sensitivities. The relation (9.15) is sometimes referred to as the sandwich 
'pr.lation. 

Remark 9,5. If Q { , are mutually independent and normally distributed^ 

Qi .'V ( ft, r cf f ) , so l:liai V ■ ■ diag(tfj\ . , it follow's from Theorem 4.21 that 

f[Q) ~ N($,S t VS) (9.16) 


since f(Q) given by (9-11) is linearly parameterized. 
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M iilt.iple Responses 

The mean and variance relations can be directly extended to the c ase of n 
random variables T = [Ti rJ1 ,T w | t and mode) responses f(Q). The linear Taylor 
expansion of f(Q) about the nominal value q with perturbations dQ is 

J(Q) « f(q) I S6Q, 

where S is the n x p seirsii ivity matrix having elements [5];j 

Ei/c«)] v m 

and covariance matrix 

var(/(Q)) = S T VS (9. 18) 

are obtained in a manner analogous to the single response relations (9-13) and 

(9,15); see Exercise 9.2. 

T1 lis Taylor series based technique is often referred to as propagation of mo- 
ment x or propagation of errors 7 and the relations (9, 13)- (9. IS) are often termed 
tHKroenf projHitnit.ion re. lotions. The first-order Taylor expansion (9-11) is linear 
and, as illustrated in Example 9,7. will have limited accuracy in highly nonlinear 
regimes. Expressions for propagating higher-order moments, computed by retain- 
ing addit ional terms in the Taylor expansion (9.10). are provided in [50], However, 
their complexity can preclude their use unless the required accuracy justifies their 
inclusion. 

Final I y, we point out that unlike Monte Carlo techniques, which provide a 
density for the response, the Taylor-based method provides only an estimate for the 
mean and variance or covariance. Hence it provides a measure of the magnitude 
but not the shape of the uncertainty. 


= The mean 


(9.17) 


Example Q.fh We revisit the height- weight Example 9.3 with the linearly param- 
eterized model (9.7). The sensitivity and covariance matrices are 



Of Of Of 

OQ ] l)Q 2 0 Q:i 




and 


so that 


V 


™(Q i ) 
trOv(Q 2 , Qj) 
cov(Q:i, Q } ) 


t'Ov(0 L) g 2 ) cov(0i : .g 3 ) 
vtir(Q s ) cwv((? a> Q 3 ) 
cov (Qa^Q[]) vaiP2;iJ 


nm\ - §i + w^2 )& + (a-/i2) 3 *, 

var|/(Q)| = var(Q x ) + {nr/12)- var(3 2 ) + {nr/12) 4 var((5 :<j ) 

+ 2(ar/l2)cov(Q 1 , Q 2 ) + 2(x/l2) 2 cov((? 1 , Q 3 ) + 2(a;/12) 3 co v(<3 a ,Q 3 ), 


As expected^ these results are identical to the directly computed mean and variance 
since the first- order Taylor expansion is exact for linear problems. 
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Example 9.7. In (3,8) of Example 3,2, we showed that the amplitude ol (lie har 
moitie oscillator squat ion 

<f z dz _ . , 

m— H- c— -V kz - /[jCOS(^t) : 
dt- dl 

z(Q) = z 0 t — (0) = 


Vn Zo(Q) 


Jo 


. where wq = \/k/m is the natural frequency. In this 


V m W -w- r y 2j tt' 2 w- r 

example, we compare the samp ling- based techniques of Section ( J.2 with perturba- 
tion methods for computing variability in the response 


PT-. (\\ 

. Zo(Q) . 

1 



' b ■ W } 

/o v( fc - 

- m^V J 

+ (c^'p) 2 


hat th 

e parameters Q 

| m t (\ A:| ^ are normally 

distributed 

[,8.5] and covariance matrix 




‘ 0.002 2 0 

0 



V 

0 0.065 2 

0 

- 

(9.19) 


0 0 

0.001- 




We note that the largest relative uncertainty is associated with the damping pa- 
rameter c, which is often the case in damped oscillating systems. Finally! the mean 
natural frequency is — 1.7743 Ha. 

The sensitivities as a function of the drive frequency 'jJ F are 

Of (k — 

a m ' ' p - mw 2 ) 2 \- ' 

Of — Ov|, 

Ite = \{k-nii4) 2 T(cu F ) 2 ^ 2 ' 

df -(k-mJ 2 p ) 

dk ~ \{k- + (wp) 2 ]** 2 ' 

and S T = I- Since t lie parameters are mutually independent and nor- 

mally distributed, it follows from Remark 9,5 that the response is also normally dis- 
1 i Lbuted for each input Frequency with mean y fl f ( ij) and variance cf 2 S^~VS. 
We remind the reader that the normal response is due to the fact that we are using 
a first-order expansion (9.11) that is linear with regard to the parameters. 

For the Monte Carlo sampling method, we computed M = 10,000 realisation 
of the response /(^y, q) using random samp! as from the parameter distributions. 
For each value of wjr, we constructed mean and standard deviation values 


JW 


vA^'f) = — E f(vF,q m ), 


m - I 


1 M 

*A»r) = — rr E <n - vA»f)]\ 


m 1 


(9.20) 
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Figure £L2- (a) Sampling mean p# from 
and (b) standard donation eT. : from (9.2 0 } 



(9-20) wad perturbation mean y fi f ( r/ ) , 

and Op = V S . 


The mean and standard deviation, for a range of input frequencies, are com- 
pared in Figure 9,2 for the two methods. It is observed that the values agree for 
most frequencies but are significantly different near the natural frequency where 
the nonlinear parameter effects are moat significant. In Figure 9.3, we compare 
the densities constructed using the two methods,, near and away from resonance. 
The kde techniques of Section 4-1 wore used to construct the sampling- based den- 
sity. whereas the Gaussian perturbation method density is given by (9.16). It is 
observed that away from resonance, the sampling-based density is approximately 
normal . whereas it 5 s skewed and relafci ve ly i lor i s ion n a I at L 77 Uv . , w I \ ich is 

close to the natural frequency of luq - 1.7743 Hz. This illustrates further ramifi- 

cations of the nonlinear effects that are not accurately quantified using the linear 
p erturbation rel ati on . 



Figure 9.3. Densities obtained using the sampling and linear perturbation methods 
at (el) 1.60 Hz and (b) 1.77 Hz, 
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Figure 9*4* 9-9% credible intervals computed using the sampling method . 

The 9 5% credible intervals f obtained using the sampling method, are plotted in 
Figure 9,4. These are easily constructed by ordering the realizations in ascending 
1 1 1'( i : 1 1‘ uii -1 : Id ■ i 1 1 1 i : ii: iy i lu : local icii of ike H.Kh mid iJ.Uo \ iikies. 1 n ■ . i-'.i: ii:'.c 1 ry 
in credible values near resonance reflects the asymmetry of the densities due to 
nonlinear effects. 

In summary, this illustrates the limited accuracy of the linear perturbation 
expansion when the input to response map is highly nonlinear. The accuracy of 
perturbation methods can be improved by including second- and higher- order con- 
tributions but at greater computational cost. 


9.4 Prediction Intervals 


The linear analysis* sampling techniques, and perturbation methods discussed in 
Sections 9. 1-9,3 address the propagation of parameter uncertainties through models. 
If Bayesian techniques are used to quantify parameter uncertainties, these propaga- 
1 ion methods will indirectly incorporate measurement, errors c* since they influence 
the variability of parameters. However, the direct influence of measurement errors 
is neglected in these techniques and must be incorporated when constructing pre- 


diction intervals for Qol. To illustrate the difference between confidence, credible.; 


and prediction intervals, we consider first linear regression 


9-4.1 Confidence versus Prediction Intervals 


Consider the linear model 

T = Xq-\- e, (9.21) 

where </o £ IP ,!I is assumed fixed but unknown. Errors £ = [c i , . . . , £>.,.] are assumed 
to be independent and normally distributed with — jVfH, tr^), An example is the 
regression equation 

p 

Ti = fji | ^ Xi jqj I £;. , i - 1 , . . . , n t (9 .22) 

J=2 

where the design matrix X Is specified in (9.4). 
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We assume that given measured data. (7.1 7) and (7.18) are used to compute 
the parameter estimator q and estimate q for the true but unknown parameter 
q$. We now consider two types of prediction at a point in the domain of the 
independent variable x but not among the data used to estimate g and q. For the 
regression equation (9-22), xq = l ,# 02 , ■ ■ - ■ ■ 

Tht ■ first is the prediction of a new observation T. f at xq, whereas the second 
is the prediction of the mean response — E(T In ). As we will illustrate, the 
interval estimates differ for the two oases. 

Consider first the estimation of the mean response E(T. ,, h We note that 

= ar o 9 


is an unbiased point estimator for Furthermore, it follows from (4.17) and 

property (ii) of (7.19) that 



(9.23) 


Since <t,“ is typically unknown, we employ the variance estimator o ' 
(7.2U)j which yields the estimator 


specified in 


* 3 (T„) = cr 2 [4{X T X)- l x a \. (9.24) 

With the assumption that ~ /V^cTr), it follows from Property 7.8 that 
T J;(I is H linear combination of joint multivariate normal random variables, which 
implies that its sampling distribution is a normal distribution with mean /v, and 
variance (9.23) so that 




■V(0. I) 


cro^x${X T X) 1 x 0 

From Definition 4.12, (7.20), and the independence of q and cr, it follows that 


has a (-distribution with u — degrees of freedom. The (1 — ne) x 100% interval 
estimator lor fi . :r , is thus 


t j: u ± t n _ Pi j_ a/2 ■ a \jxl (A' 7 ’A')" 1 i'i, . 

This is a confidence interval since fi Xo is a linear combination of parameters. 

We now consider the construction of interval estimates for the new prediction 
TV Wc again assume that the estimators q and u 2 have been computed using 
previous data in which case T Xo will be independent from q and o\ It thus follows 

that the random variable T riJ — T In will be normally distributed with mean 

E(T^ - f x J = 0 
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and variants 


var(Ta: 0 - T J|: ,) = var(Ta,J + vaifT t(l ) 

- cl fl +^f(Jf r X) _1 wol 


it follows immediately that 


T - T 

1 IQ 1 |: U 


frov/ 1 + xJ(JC r X)~ 1 Xo 


. V 


V(0.1) 


(9.25) 


(9.26) 


and 


T = 





has a ^-distribution with n — p — 1 degrees of freedom. The interval estimator for 
in 

T ro ± t„_ PiW2 • <r 0 + ^{.V T X)-%o 

This is termed the prediction interval for It is constructed by using the point 

estimates v Xil and *r for T Xn and d. 


(9-27) 


Dcfinitio li 9 h y (Prediction In t er val ) > The (I — a) x 1 00% predict ion i nt t t- 
val For a random response T. ; . to the linear model (0.2 1 ) is the pair ■. :« I statistics 
|T£.(A) t T ff(X ) constructed from a random sample X such that 


F(T;,(A) < T,, < ?r(X))= l -a, 


where T E: is a new observation at the point x\) that is independent of the data used 
to construct T (A ) and T ^(A'). 


Remark 9,0, In Section 7,3, we illustrated that linearization of the nonlinear re- 


gression model 


T = /(go) ( £ 


about yielded estimators that were analogous r,o the linear case with X replaced 
by the sensitivity matrix A'fg) where Xij (5) ■ ■ As detailed in [219], the same 

is true for the prediction interval; which is 






± £ji— p,i— 0/3 ■ <ry 1 + 4 (X T X) 


•T 


,-1 


^0 


9.4,2 Extrapolation 

To illustrate the increased uncertainty inherent to extrapolation, we consider the 
univariate regression problem 


1 ■' — <7i + + £i 
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which is simply (9.22) with p = 2, The design matrix in this ease is 


X - 


] x 


'■l 


1 J: 


ft 


(9.28) 


so that 


(X T X ) _1 ■ 


nS 


j i 


y n x 2 _y" r . 

Zj(=i ,r i Zjj=i J '< 

[ - E"=i ‘ x i n 


(9 29) 


where S — ^) 3 and $ = — is the mean, of the points used to 

estimate ^ and 72; -see Exercise 9.3. The point estimator for E(T., n } is now 

Xe^ — q\ + 7 0:^(1 


and, as developed in Exercise 9.4, 


& , 2 C 1 , (ao-i) a 

V»r(T x „) = <F 0 - + — e 

V Ji D-; 


F :r.y 


(9.30) 


The confidence interval for - E(T S(I ) is 


t*V2 


"V* 1 _l_ j *,2 / ^ 1 ( i- 0 i ‘) 

1 jt g =*= ^74-2,1 -or/ 2 J 0 " +- = 

V ft lJ f- V 


JL'u?" 


whereas the prediction interval [or T So is 


(9.31) 


r +* =.2 L | 1 | (*o -x ) 2 

T.r D =t tft- 2 , 1-0/2 L tT yH 1 

V n "-'./’j 


(9.32) 


The relations (9.31) and (9.32) demonstrate that the quality of both the til 
and predie ti on degrade as the d i st ance | # q — ^ ! * nc reases . In 1 sar t ic 1 il a r . care nu 1st 1 j e 
exercised if extrapolating with outside the region f^mtm %ax]- Details regarding 
extrapolation for the multivariate problem can be found in [96], 

Remark 9+10* Prediction intervals constructed in this manner are point-wise in the 
sense that they apply to individually specified future measurements. This is weaker 
than the concept of simultaneous ( 1 — o ) x 100% prediction intervals, which specify 
the probability that all future measurements or model evaluations lie within the in- 
terval. As detailed in [219], the Bonferroui method is one technique for constructing 
the wider simultaneous prediction intervals. 


Example 9,11* To illustrate these concepts, consider the model 

I" 1 — 71 + ([2 T ■'■ 4 - 
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(a.) (b) 

Figure 9.5. Data., point estimates, and (a) 95% confidence and prediction intervah 
obtained using the linear theory of Section 9,4.1, and (b) 95% credible and prediction 
internals produced by the Bayesian analysis discussed v n Sample 9.13. 

v. 1 1 1 : n. ■ 1 lie 1 i i u : [mi'am[ L 1(Ts n: i. 1 r;., [9-6, 1. .2] . We specify £i ~ N(0 1 rTjj), willi rr ( | — 3, 

to construct synthetic data at the 26 points Xi shown in Figure 9, 5(a) . In the same 
figure. vv " P ■ 11 1 1"' 95y{ i. . : i : i- b ii-v m: n : | irs *: 1 i : : "ii ii:i ■ i " v - i : - icifird in Sod ion 1J. I . I 
and (9.32) using the estimated parameter values q — [0.6920. 1.2752]. We note that 
both intervals are tightest at x = 2.25 and that the uncertainty associated with 
both the model fit and predictions grows more rapidly for extrapolation outside the 
region [#iiiinT#iiLax] [1 3 3-5], The observation that 2 out of the 26 data points lie 
outside the 95% prediction interval is consistent with its definition. 


9-4.3 Prediction Intervals for Uncertainty Quantification 

T1 le relations (9.27) and (9,32) for the prediction intervals are based on the sam- 
pling distribution (9.26) for T j; . tl — where T^ D is the estimator for the fixed, but 
unknown, parameter . , As noted in Remark 7.1, sampling distributions do not 
always correspond with distributions for associated random variables, so care must 
be exorcised if considering them for uncertainty quantification. For example > use 
of the relations (9.27) and (9.32) will clearly yield inaccurate prediction intervals 
when the actual distribution for the mean response is highly non- Gaussian since the 
relation (9.26) for the sampling distribution is Gaussian. Hence while the analysis 
of Section 9,4.1 motivates the general framework required to construct prediction 
intervals for uncertainty quantification, we must interpret the construction of re- 
sponse variances in a manner consistent with the assumption that inputs and Qol 
arc random variables with associated distributions. 

From (9.25), w T e note that the variance of T^ Q — T^q is the sum of the measure- 
ment error variance and the variance <j ? {T I o ) associated with the mean model 
response. The measurement error can be specified from experiments or estimated 
i,-n.e I Iji- Havesiah hTtm:qUL> ■: -1 CLapier V I'Ilo distribution Ibr ::i«- !..•;• .rj re- 
sponse is constructed by propagating input uncertainties through the model using 
the techniques detailed in Sections 9 .1-9,3 and Chapter 10, The model response 
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distribution yield* the credible internal, which w a mg&su vc of the m.odzl fit. The 
sum of the propagated uncertainty and measurement errors provides the prediction 
interval , which is typically required for predictive estimate >n. 


Remark 9.12. For the linear and linearized models discussed in Sections 9.1 and 
9.3; c -2 nw is specified by (9.2-1), whereas orthogonality properties of the Her mite 
or Legendre polynomials used to represent Gaussian or uniform distributions can 
be exploited to provide relations for when using the spectral methods of 

Chapter 10. If no other moments are specified, this will yield Gaussian distributions 
for the propagated response and credible intervals that will likely agree with (9.24 ) .. 
which is based on the sampling distribution. The sampling methods of Section 9.3 
can be used to directly construct- credible intervals for non-Ganssian distribution. 


It is illustrated in Examples 9,13 and 9.14 that the prediction intervals constructed 
by sampling from the indices of t.he DRAM chains are consistent with the sampling 
distribution theory of Sections 9.4.1 and 9.1.2 for certain problems but permits 
the quantification of non- Gaussian response intervals for nonlinear problems with 
potentially non-Ganssian parameter distributions. 


Example 9.13. We revisit: Example 9T1 but, in this case, we employ the Bayesian 
theory of Chapter 8 to construct densities for the parameters and ^2 ^nd mea- 
surement vari at 10 ■ <j „ We t her 1 pro pa gate the i n pi. it ur icer t ai nties throi igh the ■ 1110 del 
to construct- credible intervals and prediction intervals comprised of the summed 
measurement and response uncertainties- The results obtained using the DRAM 
algorithm described in Section 8.G are plotted in Figure 9. 5(b). A comparison with 
Figure 9.5(a) illustrates that in this case, the propagated uncertainties agree with 
those computed using the sampling distribution theory of Sections 9.4.1 and 9.4.2. 
As illustrated in the next example, this will not be true in general. 

Example 9,14. In Example 8.13, we illustrated the use of the DRAM algorithm to 
construct densities ft >r the parameters Q = [?>#. 5, rfi, k 2 , Ai , ift] 1 and error variance 
a 2 in the HIV model 


Ti = Ai -diTi-0 VT U 

7F = A^ — d^T'y — (l — /t ) k'2 VT'i i 
Tf = (1 - £)fci VTi - ST{ - m\ETii 
t; = ( 1 - fs)h 2 VT 2 - 6T 2 - m 2 ETj t 
V = N r J(TT + T 2 H ) - cV - |(l - + (l 


E- Ae T 


M?T + Tl) d B (T^ + T£) 


7T + T* 4- K 


7’r 4- T* 4 K d 


fe) fi2 k 2 T 2 ]V. 

& e E 


of [2, 3 . Here i' : and represent the populations of uninfected and infected 
T-lymphocytes. T% and Tl are corresponding macrophage populations, and V, E 
denote the populations of free virus and immune effector cells. The pairwise joint 
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sample plots in Figure 8.15 illustrate that & 2 , S and Ai , d\ are conr.lated, which vio- 
lates the assumption, of mutually independent, parameters , as generally required for 
the stochastic Calerkin , collocation ? and discrete projection methods of Chapter 10. 
In this example, we illustrate the construction of credible and prediction intervals 
for the states by randomly sampling from the burned-in parameter chains illustrated 
in Figure 8.13, as illustrated for two chains in Figure 0.6, 

To construct 95% credible intervals, we sampled by index from the parameter 
chains to construct. oHDO real i Nations of the model. This uncertainty wr-is added 
to the estimated error variance to construct prediction intervals. These credible 
and prediction intervals, along with the point estimates and synthetic data, are 
illustrated for the immune effector cell count E in Figure 9-7. We note that both 
inmrvals an- asymtnH.ri:- \rj::i i> ^p-v! U.> the pniiu nsl imat- I ■ i ■ 1 w\i-u 3u and ">i:> 
This is due to the highly nonlinear nature of the problem in combination with the 
slightly non- Gaussian parameter densities shown in Figure 8.14. This asymmetric 
and non-Gaussian behavior of the response would not be quantified using the sam- 
pling distribution theory of Sections 9.4.1 and 9.1.2. This illustrates the necessity 
of propagating input uncertainties through the model to accurately quantify the 
distribution for the Qol. 


9.5 Notes and References 


For certain applications, operator- based or moment methods can be used to quantify 
response uncertainties. Because these methods arc quite problem-dependent, we 
do not focus on thorn but instead refer readers to [266] for del nils regarding ihis 
methods. 

The reader is referred to [50, 53] for details regarding the perturbation tech- 


niques of Section 9.4 and further discussion of their use for sensitivity analysis and 
uncertainty quantification. These perturbation techniques are also employed in the 
^best-estimate" data assimilation framework detailed in [52, 54]. This framework is 
based on the principle of maximum entropy* and it provides best-estimate calibrated 


'■« 

Q.3025i — i r 



2500 5000 7500 1G000 12600 15000 

1 1 1 : J i r 


A 



SOW 7500 -QQOQ 12500 15000 

I ixte* 


Figure 9.6. Five index-based samples from, (.he chains for hs an d 6- 
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Figure 9-7. Mwm. synthetic data, and 95% cmii&ie and prediction intervals for 
K: (a.) full time internal and (to) reduced time interval to illustrate, asymmetric and 
non- Oaussi a n behavi o r. 

Parameters and responses along with paramo lor, response, and parameter-responso 
oo variance matrices. The response covariance matrix can lie directly used to con- 
struct credible intervals for the model response along with prediction intervals if the 
measurement errors are incorporated in the manner detailed in Section 9.4.3. The 
theory and applications in [50 7 52, 53 are based on first order , and hence linear, 
expansions which can limit their accuracy in highly nonlinear regimes. Theory and 
applications utilizing higher-order expansions are provided in [1971. 

9.6 Exercises 

Exercise 9+1* Consider the linearly parameterized problem 

T - XQ + e, 

where Q = [Cj f . . . - Qp] and z = [ej , . . . , .s n ] are random vectors and X £ R viXj& is a 
deterministic and known matrix, We assume that s ^ A r (fJ, V), where V = a' 2 I n x rc . , 
and that the mean q and covariance matrix V q for Q are known. Use the results 
from Exercise 1.5 to establish mean and covariance relations for T analogous to 
(9,5). 

Exercise 9.2. Derive the relations (9.17) and (9.18) for the mean and variance of 
T when there are h random variables and model responses. 


Exercise 9.3, For the design matrix A" defined in (9.28), use Cramer's rule to 
derive the inverse relation (9.29). 

Exercise 9+4+ Use (9,29) to derive (9,30) using the matrix relation (9,23), 


Exercise 9+5+ Consider the heat equation with constant diffusivity a and f{t,x) 

Tf — Tr 0. It was established in Exorcise 3,1 that a finite difference discretization 
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with an N H- 1 spatial gridpoint yields the vector relation 

T .h i _ ^ + 1 T°. 


where T : * — [Tij, , , . , Tjv-i.jI 7 " and the (A r — 1) x (A" — l ) matrix A is defined in 
(3.51). Here we take a = 2. = 1 + ih with h = -j and consider the initial heat 

distribution T (l to be a random vector with mean T {} r = I l)] n i = I .... ,7, 

and covariance matrix i = OA" I - The temporal stepsize is taken to be k = 0.01 
to satisfy the stability constraint 


Lei T- denote the solution at tf d_ 


and consider the statistical model 


T f = AT 0 


+ c\ 


where A — A ' 1 and the l n e as i ire in .e ut error is iLd and e J\ (o.o.or*-/.-;,?)- t se 
the results of Exercise 9.1 to compute credible and prediction intervals for T-' , and 


discuss their relation to the distribution for T 1 


in the context of the diffusive nature 


of 1 lie heat equal ion. 


Exercise £L6„ Consider the steady state heat model 


d 2 Tt 

dx~ 

dT s 


2{a i b) h 


ah k 


icoj - 


$ 

H* [0} = ~ 


<rr t 

d:r 


(L) - 


h \T 

£ l at7i.fr 


T,(L) I 


detailed in Example 3.-5 and illustrated in Example 8.12 in the context of Bayesian 
model calibrate >n. Here we are going to employ a subset of the data from Table 3.2 
to construct prediction intervals that extrapolate beyond the calibration domain. 
Employ the DRAM code discussed in Section 8.6 to construct densities for 4 1 . h , and 
€ using the dat a compiled in Table 0.1. You can use the thermal conductivity value 
k 2-37 for aluminum. By sampling from the densities, construct credible 
and prediction intervals for x F 1 10, 66 1 and plot with the complete data set from 
Table 3.2. Does the correct percentage of data points outside the calibration domain 
lie within the prediction interval': 1 


x (cm) 

22 26 30 

34 38 42 

-16 50 

51 

Temp ( :1 C) 

57.96 50.90 44.84 

39.75 36 16 33.31 

31.15 29.28 

27.88 


Table 9,1. Steady state temperatures measured at. locations x for an aluminum rod. 
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Exercise 9,7, Wo revisit the IT T\" model 


T, = A, -djT] - (1 -e) JfciVT,, 
t 2 = A, - f/j'/o - (1 - /e)/r 2 l/T 3j 


- (1 - e)fciVTi - JTf - mi£Tp 
r; = (1 - f€)k 2 VT 2 - m - m 2 ET J, 


V 

E 


NtS{7? + T;)~ cV - 1(1 - s)p\.k\T\ + (1 - is)p 2 k 2 r 2 |V, 
b E (T{ + Tl) ^ d E (T; 


An + 


T{ I 


rn* 
1 2 


K b E T{ + 


T> 


T.J) 
+ K d 


E - (S e E 


il ! list rated in Ex amples 3.3, 8.12, and 9.14., U se youi ■ DR A A I co de fro m Exerc ise 8 . 6 
to construed 95% credible and prediction intervals for each of the states. Does the 
correct percentage of data He in the prediction interval? 




Chapter 10 


Stochastic Spectral 
Methods 


The objective of stochastic Galerkin, collocation, and discrete projection methods 
ean bo viewed from two perspectives. In one sense, they provide techniques ibr con- 
structing surrogate models of the type discussed in Chapter 13 for high-dimensional 
problems in the parameter space based on low-order expansions. This interpre- 
tation lends itself to Bayesian model calibration* sensitivity analysis, design, and 
control implementation, From the perspective of uncertainty propagation, in which 
one is sampling from the posterior density, it can be useful to view them as tech- 
niques to significantly reduce the number of deterministic model solutions required 
to construct moments associated with a Qol. This latter goal is achieved by em- 
ploying spectral expansions that exploit the smoothness, often exhibited by high- 
dimensional parameter spaces, to construct sampling or solution techniques with 

convergence rates that are significantly faster than the — i— Monte Carlo rates for 

u v m 

moderate parameter dimensions. 

We note that all of the techniques discussed in this chapter require either mu- 
tually independent parameters or representation of the joint posterior density 
ft ir implementation. As detailed in Section 5.2, parameters in typical applications 
are often correlated and representations for p Q (q) are usually unavailable or not 
feasible, If marginal distributions and a correlation matrix are provided, one can 
employ a Nataf transformation in combination with a Cholesky decomposition to 
construct a formulation witli mutually independent Gaussian random variables, If 
tli is information is not available, sampling from indices of parameter chains, in Hie 
manner detailed in Remark 9,4 and illustrated in Example 9,14, may constitute the 
only option for constructing prediction intervals. 


10.1 Spectral Representation of Random Processes 

The goal when constructing spectral expansions is to represent random processes 
in a manner that exploits the smoothness often exhibited by high-dimensional pa- 
rameter spaces and facilitates the construction of moments for Qol. Depending on 
the basis choice, these expansions are often termed polynomial chaos (PC) or gen- 


in7 
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eralized polynomial chaos (gPC) expansions. The term da tew back to Wiener, who 
used it. ic the context, of Hermits expansions employed for constructing a physical 
theory of chaos 202 > and it is well established in the literature. In the context of 
uncertainty quantification, however, it is a misnomer since systems under investi- 
gation are not typically chaotic. To the degree possible, wo avoid it and instead use 
the more descriptive terminology spectral expansions. 


10.1,1 Polynomial Expansions 

To motivate the expansions used to represent stochastic processes, we consider 
sequences {Q*{ OJ »r=i of random variables defined on the sample space £i of the 
probability space (£"1, P, P). We let Pjt denote the space of polynomials with argu- 

ment Q.y having degree less than or equal to k and let be the set of polynomials 
in F* that are orthogonal to P*-l‘ The space F^ is sometimes termed the Wiener 
PC of order k when considering Gaussian variables. We note that since the random 
variables are hi notions Q k : £1 — * K, the polynomials in F*. can be interpreted as 
functionals. 

As detailed in [57], second- order (finite variance) random variables u can be 
represented as an infinite expansion 


CO cc- cc 

w(w) = ^oPq 4- Ui 1 P\ (Q^ ) 4 w ii,i2^a(0ji i^ 3 ) 

T 1 — I / J =1 t2 = l 

W W M 

1 52 52 i ■■■ 

*■1 — i is — 1 i s =l 


( 10 , 1 ) 


of increasing interaction terms , . . . , Q^), where 14, , Uj 3 f . . . are real coeffi- 

cients. This can be written more compactly as 

CO 

u(Q) = ^itkVkiQuQtf ■ ). (10.2) 

k = 0 

where there is a one-to-one correspondence between the coefficients and polynomials 

in (10.1) and (10.2). 

In practice, one typically considers a finite set ofp random variables Qj Q p 

with a limited number n of interaction terms, lor ex ample + truncal ion of (Uhl) at 
second-order interactions yields 

r v P ___ 

Nix’) = UfjPo + £«*.*<*,) + 52 52^ (10.3) 

i 1 = 1 i-i = 1*2 = 1 

whereas truncation of (10.2) to K terms yields the expression 

K 

u K (oj) = J^u k V k (Q ] ,Q' l ,...,Q p ), (10.4) 

k=(i 

where A' + 1 = 
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Wo now consider the representation of a random process x y w) that is a 

['unction of a random vector q{.jj) Q (k;)| : SI — ^ R. As in Seciiou b.l, 

we exploit the equivalence between realizations w 6 fi and values of the random 
vector Q(w) E T C M p , where T t = Q^(w) and T = Yi^=i Fii to pose the problem 
in the image probability space (F, 5(r),pg(#)dg) where p Q {q} is the joint density 
associated with Q. For (t, a:) € [0, T] X V, we separate spatio-temporal and random 
dependencies to obtain the finite- dimensional representation 


K 

u R (; t,x,Q ) - (10.5) 

k-\} 

where tifc(£, i) are deterministic coefficients and tf-Tte) are orthogonal polynomials 
that form a basis for the random component of the solution. We refer the reader 
to [148. 2GE1 for details regarding the strong and weak nature of approximations and 
simply note that at some level, they often invoke the Cameron- Martin theorem. 

To construct u h . one must specify appropriate basis functions 'IT(O) and con- 
straints to determine the coefficients. We illustrate in Section 10.2 that stochastic 
Galerkm and collocation techniques can be used to specify «*(£, for fairly general 
classes of differential equations. We focus on classes of globally defined orthogonal 
polynomials and refer the reader to [19] for details regarding piecewise polynomial 
bases < 


10.1.2 Basis Construction For a Single Random Variable 

For a single* continuous, random variable Q, we take ipk(Q) to be I -I) global poly- 
nomials that are orthogonal with respect to the density p Q (q) and indexed so that 
0o = L It follows that 

e[<Mq)] = i (io.fi) 


and 


E ' 0; (Q }Vj (Q )] = j Vi ( q ) Vj (q ) p Q { q ) dq 


(10.7) 


— SiyTi * 

where {■, -} denotes the L 2 inner product on the interval T with the weight (</)- 
The normalization factor is 



{ 0-0 



(10.fi.) 


n 

We note that some texts employ the bracket notation {Vi}~ to specify the expecta- 
tion E[0 y(q)] in the image probability space (r.S^AjfoJdg). 


Statistics of Orthogonal Polynomial Expansions 

The orthogonal polynomial expansion (10.5) has die advantage that statisti- 
cal and sensitivity analysis can be performed cither analytically or with minimal 
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computational effort, once the coefficients have been determined, We illustrate the 
characterization of t lie mean am I. variance and refer the reader to |C>8. 2J l tor det ails 
regarding the use of this expansion to compute the global Sobol sensitivity indices 
discussed in Chapter 15. 


Property 10,1 (Mean and Variance), For fixed values off and x, the mean of 
in given by 


E u h (f, x, Q ) ] = E 


K 


2ju fc (4, x)ty k 


,fc=o 


K 


= uo((,ar)E[^o(C?)] + y 

= u 0 (f , ;r) 

since E[t>o(Q)| — 1 and — 0 for k > 1. Similarly, the variance is 


( 10 . 0 ) 


var [■u i '' {t. x, Q ) | IK [u K {t, x.q) — IK [w. A (f , x, Q ) |) 


K 


K 


- E 


K \ 

> V’k(t, x)ipk.(Q) - «o((, ^') 

h jt=n 


/ 


= E 


h. 


T u*{i.x)vjt (Q) 


,*= 1 


( 10 . 10 ) 


K 






where the last equality results from the orthogonality of the basis functions and 
the definition (10.7) for 7;- , Higher- order moments and correlation functions can be 
co rn inited in a similar manner. 


Poly iLoiriials for Normal and Uniform Densities 

We now illustrate two choices for the orthogonal polynomials that are com- 
monly used in spectral representations, 


Example 10*2 (Hermite Polynomials for Q ^ N ( 0 , 1)) + The pdf is 

p Q {q) = -±=<r? i\ (10.11) 

V l7\ 

which is defined on T = EL The Hermite polynomials 

tfo(Q) = 1 , = Q , H- 2 (q) = Q' 2 - 1. 

t „ (10.12) 
// ;5 (q) = Q i - 3Q , II.iiQ) = Q 4 - SQ- f 3 
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constitute t.he natural choice for 
line with respect to this density, 
formulations. 


$k(Q) since they are orthogonal over the real 
This was the basis choice for the original FC 


The normalization constants can be expressed as 


7 i = 





? ! 


(10.13) 


We note that the polynomials (10.12) axe sometimes referred to as proba- 
b i list" Her mite functions to differentiate them from “physicist” Her mite polynomi- 
als that are orthogonal with respect to 


Pq(<}) = 



(10.14) 


Unfortunately, essentially all tables that specify Gauss-Hermite quadrature points 
employ the weight (10. L 1) rather than (J0.il), This necessitates that commonly 
tabulated quadrature points and weights be scaled for uncertainty propagation, To 
i 1 i nst rate . we I et. w r , x 1 ' denote t he qn m irati ire wei ght;s an d nodes cor real h )n d i ng to 
the density (10.1 1). It follows that 


(10.15) 

Hence the weights and nodes corresponding to the density (10,11) are 

U? ; = -^= , f = \fz-x 1 '. (10.16) 

V’ 1 ' 


I 7(7) .... 

R V^TI 


L- e - J 2 /2 


dq= d= / 

V ^ JR 

R r 

^Y'9(V2^x t }^= 


r— I 
R 






Example 1U.3 (Legendre Polynomials for Q U{- 1,1)). Tlie Legendre 

polynomial^ of which the first five arc 


, P\{Q)-Q ■ Pi(Q) - -,q' J - 

„ , v 5 , 3 4 35 , 15 , 3 

ft(Q) = 2 Q ~ 2 Q f = Y Q ~ ~ Q ' 8’ 

are orthogonal on the interval r [—1,1] with respect to the density 

Pq(<I) = f 


(10.17) 


(10.18) 


Hence they form a suitable basis for uniformly distributed random variables oil 
[-i.i]- 
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Representation of Normal and Uniform Random Variables 

The next two example?) illustrate the construction of the coefficients in 
(10.5) and representation of normal and uniform densities using Her mite and Leg- 
endre polynomials. 

Example 10.4 ( u ^ A r (/j. fx a )). The random variable n. can be expressed hs 

u = ft + vQ, ( 10 . 19 ) 

where Q ~ N( 0, 1) — we all know this from the MATLAB documentation for the 
normal random number function randn.m. Hence 

K 

u K {Q) = ^u k MQ) ( 10 , 20 ) 

provides an exact representation for u when v^(Qj are Her mite polynomials and 

u$ = . U] = a , u k = 0 , k > 1, 

Example 10,5 ( u £i(a. h)) r Consider ;/ — U(a, 6) with mean and variance 

a + h (fj - a ) 2 

" = — ’ " = -12~ 

If is established in Exercise 4,0 dial 


u = f.1 I 



where Q ~ W(— l f 1). Hence (1U..2U) also provides an exact- representation of u where 
rj?k(Q) are now Legendre polynomials defined on (“l, I . 

10.1.3 Multiple Random Variables 

The hi: idai nent a I concej >t.s regard mg the rep resen tat i oi l o f random vectors ai \ d ran- 
dom processes that are functions of more than one independent random variable are 
analogous to the univariate case. The assumption of mutually independent random 
variables implies that the expectation of a product of random variables is the prod- 
uct of their expectations. This motivates multidimensional basis functions to be 
constructed as tensor products of the previously discussed univariate polynomials. 
These formulations are simplified through use of multi-indices. 

Definition 10.6 (yj- Dimensional Multi-Index). A />- tuple 

V = (kj k p ) € Njj 

of nonnegative integers is termed a j>diinensional multi- index with magnitude |k ; | 

A'i T k 2 H — ■ + k 1M and sal is lying i lie ordering j < k ^ /; < k ; , for i = 1, , , . ,/j. 
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We now consider a random vector Q - [q x , . „ „ . Q p ] of mutually independent 
random i variables and let denote the uni vain ate haste functions up to 

degree K in the variable Q * . The p - variate basis functions of total degree less than 
or equal to K are defii led to bo 


^ i ' ( ^ ) ■ ■ Vt] 1 ) i ■ - ■ ■ ! . ; p (*2p } 

for 0 < \V\ < A", The multivariate basis functions thus satisfy tlic orthogonality 
conditions 

E[®j-(Q)%(0)] = j ^v{qWy(q)p Q {q)dq 

= PriVy), 

^i'y7r? 

whore T nr=i Similarly. defined in (5.2) is (lie product of the densities 

associated with each random variable.. whereas jy = IE|^r.. | = y ; . ---^ is the 
product of the univariate normalization constants defined in (10.8), Finally, Syy = 
$iiji ' ' ' denotes the extension of the Krone cker delta to /> variables . 

For a random process u(f, ac } fl) : [CL 7" x V x T — > K, we employ the expansion 


u h (t,x i Q)= 

1*1=° 


which is the projection of u onto the space ^k{Q) of all polynomials of Q £ [R ;| of 
degree up to K. 

The multi-index notation provides an elegant mechanism to specify the expan- 
sion based on multiple random variables, but it can be cumbersome to manipulate 
when implementing the method. Alternatively, one can employ a single index k 
chosen according to various conventions. A common choice is the ordering, demon- 
strated for p = 3 in Table 10,1, that ranks multi-indices in order of increasing |k'| 
and specifying that the first nonzero component in k' — j' 1 be positive. As detailed 
in [2fiti], other orderings are advantageous in certain settings. For example, compo- 
nentwise orderings are employed in the HDMR expansions discussed in Section 13.5, 

The expansion based on a single index is 


K 

u h (t,x,Q) = t x)^ k (Q). 

fc=0 

To achieve order n polynomials, it follows that K ■ — 1. as illustrated for 

p = 3 and n = 2 in Table 10,1. 

Tl ie orthogonality of the basis functions can be exploited to obtain the repre- 
sentation 

u*(£,ar) = — E[w(£,ar, Q)tf*(Q)] (10.21) 

7* 

for the deterministic coefficients. Whereas (10.21) is an optimal projection in an L~ 
sense, it is no l generally useful from a computational perspective since v ■ is unknown. 
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k 

B 

i Multi- [nd ox 

Polynomial 

0 

0 

(0,0,0) 

V0 ( Q i H : \\ ( C?2 ) vo C Qs ) 

l 

1 

(1 : 0,0) 

' 1 1 ( Q i ) Vi i ( Q 2 ) Vo (Qa ) 

2 


(0,1,0) 

Va(Qih lj i(Q 2 )vo(C-t) 

■5 


(00, 1) 

Vo(Qi ) t - : m(Q 2 .JVi (<2a) 

4 

2 

(2,0,0) 

(Qs) 

& 


(1, 1, o) 

vi(Qi)$i(Q2)vq(Qx) 

S 


(1,0.1) 

1&i (Qi (Qs) 

"T 

1 


(0.2. n) 


8 


(0,1, L) 

V't)(Q]h''i(<52)vi(<3:j) 

9 


(0.0,2) 

Vo(Ql)^o(<?2)^2(<?3) 


Table 10-1- Si n(fk rt idw f mull t - ? nd&Vj u t id ten sored po hi t i-O mi 0,1$ f a r p — .1 . 


The Galerkin, collocation, and discrete projection methods provide const raints used 
to develop numerical algorithms lor computing U& [t, ;r) . 

10.2 Galerkin, Collocation, and Discrete Projection 
Frameworks 

The stochastic Galerkin, collocation, and discrete projection fry, me works arc anal- 
ogous to their deterministic counterparts, In the Galerkin framework, one projects 
weighted residuals onto a finite-dimensional subspace spanned by appropriate basis 
functions to provide the constraints required to solve for the deterministic coeffi- 
cients. This project ion requires the construction of expectations or inner products 
not typically employed in deterministic codes and hence it is intrusive in the sense 
that existing codes must be modified. For collocation methods, the constraints are 
provided by approximating the solution of the governing equations at a discrete set. 
of points termed collocation points. For stochastic collocation* these points are typ- 
ically values in the random variable space used to represent parameters or inputs. 
Because existing software and codes can be used to determine approximate solu- 
tions at the collocation points, the methods are nonintrusive ? which is one of their 
primary advantages. Discrete projection relics on the use of quadrature techniques 
to approximate (10.21), which requires the solution of u(t./x*q r ) of the governing 
equations at the quadrature points q ! , 

Because Galerkin and collocation methods respectively rely on projection and 
interpolation, their convergence analysis differs. Hence from the perspective of 
numerical analysis, they are typically a<: [dressed as separate topics. From the per- 
spective of implementation, however* collocation can be interpreted as a special case 
of discretized Galerkin that results when basis functions satisfy the delta property 
5'4<? r ) = Sky ■ Furthermore, discrete projection methods can also be placed in the 
Galerkin framework with appropriate choices for basis functions and quadrature 
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rules, Hoiicr we ini reduce collocation and discrete projection within a Galerkin 
framework . 

As noted in Chapter T spectral expansions can be advantageous for problems, 
such as nonlinear PDFs in one or more space dimensions, that are computationally 
intensive. However, the complexity of such frameworks obscures the initial presen- 
tation of the stochastic frameworks. Hence to illustrate the framework, we initially 
develop it in Section 10.2,1 for a nonlinear scalar initial value problem while noting 
that other methods may he equally successful for this problem due to its simplic- 
ity. We illustrate the Galerkin. collocation, and discrete projection methods for 
boundary value problems and stationary PDE& in Section 10,2.2 and evolutionary 
PDEs in Section 10-2-3. We summarize attributes of the three methods in Sec- 
tion 10.2.4. We provide examples of the Galerkin method in Section 10.3 and detail 
sparse quadrature and collocation techniques in Chapter 1 1, 


10.2,1 Scalar Initial Value Problem 


We first consider the scalar ODE 


— = /(te,u) ,f> 0, 

ill 

u((hQ) = tin ■ 


(10,22) 


This setting allows us to illustrate attributes of the methods without the added 
complexity of infinite-dimensional states due to spatial dependence. We assume that 
Q — [q i , . . . , Q r ] are mutually independent random variables with range T C E iFi and 
joint density p^(q), as detailed in Section 5. id The L 2 inner product with respect 
to this density is denoted by ■} , We take the quantity of interest (Qol) to be 
tke expected state value 

y(i) = j u(t,q)p Q (q)dq. 

We consider solutions in the space L '(Q.'J':Z) where Z is chosen to reflect the 
regularity of the random component of the solution. If we let denote the 

density of the i th component of Q, it is natural to consider the space L 2 . (T,) of 
ft l 1 1 ct i < ms w i 1. h £ i i : 1 1 e norm 


HsHh = 

which ensures that second moments are well defined. If higher- order moments are 
required, one can alternatively employ spaces L™(T). For mutually independent 
random variables with finite second moments, we take 

Z = Z=(r) = L% (Fa) © ■ • ■ © L^OV). (10.23) 


" As discussed in Section 5,2, parameters in a, number of applications are correlated, which vio- 
]&.tcs tlie= assumption of mutual Lndopeiidoticc-. It is noccd in Section lO.'2 J -4 that this can constitute 
a limitation of stochastic spectral methods if Nataf or Rosenblatt transformations are infeasible. 
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To approximate u(L Q), we lei ■ be a basis for the random space and 

lake Z h = C Z The projection of ■?£{£> Q) onto Z h yields the repre- 

sentation 

K 

v K (t,Q) = £>*(*)**(«) (10.24) 

*=Q 

For orthogonal basis functions defined on T with density p, (<j), the generalized 
Fourier coefficients satisfy 


u k (i) = — f u(t,q)Vk(<])pQ(q)dq, (10,25) 

Ik Jr 

where 7^ — {^k t ®jt) *■ Since we typically cannot solve (10,25) explicitly, the spec- 
ification of constraints required to approximate the coefficients tijt(t) is one factor 
that defines solution methods. The specification of appropriate basis functions is 
the second issue that delineates various solution techniques. 


Stochastic Galerkm Method 

As in the deterministic problem, the strategy for stochastic G&Ierkin meth- 
ods is to project the weighted residual onto a finite-dimensional space spanned by 
appropriate basis functions. Here we employ polynomials that art j orthogonal with 
respect to the density f\>iq)- We thus seek approximate solutions v h ‘ [L Q) that 
satisfy 


0 = 


du 


K 


dt 


- L 


r 


K 


fc = 0 




K 


ft- 0 


10.26) 




For all f ^ K - This i* often termed the lue&A: stochastic witirfei fovwtutcttiott) and it is 
equivalent to specifying that 


E 


du K it, Q) 
dt 


*;iQ) 


IE [/(t, Q t U K ) 


(1 0.27) 


Initial conditions for (10,20) or (10.27) are constructed by projecting the original 
initial conditions onto the space of polynomials Z h in the manner detailed in Sec- 
lion FI 2. 

Approximation of the inner product using a quadrature rule with points q T 

and weights w' yields 

H 

E 

r= ] 

which fluids for i = 0, , , K , 


k 


Y - 1 




Lk= 0 


dt 


A; = 0 


= 0, (10.28) 
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For low- dimensional parameter spaces, tensorsd GalerMn quadrature tech- 
niques provide sufficient accuracy ami efficiency. However, tensored quadrature 
techniques are not applicable for even moderate dimensions due to the exponential 
growth in the required number of points. For low to moderate dimensions, this is 
addressed by the sparse grid techniques discussed in Section 11.1, 

Collocation 

The basic strategy for collocation is quite simple; (i) using either deterministic 
or stochastic methods, generate M samples from the parameter space, 

which constitute the collocation points, and (ii) enforce 

u(t, <t m )=u K (t,q m ) (10.29) 

to provide the constraints necessary to solve for at each time. For general 

basis functions 3^, the constraint (10.29) yields the Vandemonde matrix system 

. ••• *k(v m ) 

To prevent the system from being underdetermined > one would typically require 
A7 > K + 1. For o verde ter mined systems. M > K \ I, (10.30) can be solved in the 
least squares sense using, for example, the MATT. A R command pinv.m. 

Although very straightforward to formulate, two issues can make (10.30) diffi- 
cult or impossible to implement for large parameter dimensions. The first concerns 
the choice of collocation points q m . As will be illustrated in Section 11.2, uniform 
meshes can produce spurious oscillations and yield highly ill-conditioned colloca- 
tion matrices. This can be avoided in one dimension by using nonuniformly spaced 
points, but care must be exercised when extending these meshes to multiple dimen- 
sions. 

The second issue regards the choice of basis functions. General basis func- 
tions yield a A/ x (K -\- 1) dense matrix where M will be extremely large for high- 
dimensional parameter spaces. This is avoided if one employs Lagrange polynomials 
that satisfy 

L k (q m ) = S km (10.31) 

as basis functions ^(q), For M ; = JC + 1 collocation points, the collocation matrix 
is now a (K I 1) x (K I 1) identity matrix and 

u m (t) = u(tq m ) (10.32) 

for m = M. 

This collocation relation can also be directly obtained from (10.28) if one 
employs and employs the quadrature points q ( as collocation p oil its q m . 

Specifically, this yields 

fji j 

at 

which is equivalent to specifying v. rn (/) by (10.32). 



1 

7 - 1 - 

1 


1 

S“H 

1 _ 


_ u K(t) . 


1 

■^-1 

. . ^ 

*>- 

1 


(10.30) 


(10.33) 
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Remark 10*7* The Lagrange basis is orthogonal only for the set of collocation 
points for which ii whs defined- Hence the relation (10-25). which relies on orthog- 
onality with respect to p Q (r/)> does not generally hold. 


Discrete Projection 

The strategy for discrete projection* which is also termed pse udospectral , is 
to approximate (10.25) by 


u k (t) 


i 

7fc 


R 

y', <i y&k{<f)pQ{q r )«' t ’ 

r = ] 


( 1 . 0 . 34 ) 


to provide the time-dependent deterministic coefficients. Since u(Lq r ) is the solu- 
tion of (10.22) at the r quadrature points, the computational effort is essentially 
equivalent to that of the collocation mel hod. 


10.2.2 Boundary Value Problems and Elliptic PDEs 

We now formulate the stochastic Galerkin method for boundary value problems anti 
PDEs that exhibit spatial dependence and hence have infinite- dimensional states. 
To simplify the discussion n we initially focus on steady problems. The strong for- 
mulation of the deterministic model is specifier! as 


A r {u y Q) = F(q) . r V, 
B(u, Q) = G[Q) , x C OV, 


{ 1 . 0 . 3 - 5 ) 


where u(r. Q) is the state, Af is a potentially nonlinear differential operator, F{q) is 
ri source term, and B{u. Q) and G(q) arc boundary operators. The spatial domain 
V is a subset of E 1 , ®". or E d In general, Ad F, B. and G can also depend on the 
spatial variable x , but we suppress this dependence to simplify notation. 

As a CJtJ. we consider the expected state value 


y{x) = j q)pQ{<i)<tq (io.36) 

at points x € T ' . More complex Qol can be evaluated in a similar manner. 

To construct a weak formulation of the deterministic problem, we let V" de- 
note a space of test functions that satisfy the essential boundary conditions* The 
deterministic weak formulation can then be posed as the problem of finding u £ V 7 , 
which satisfies 

j N(u,Q}S(v)dx — I F(Q}vdx (10.37) 

Ji? jj> 

for all v £ V. The potentially nonlinear differential operator N and linear operator 
S are const ructed from A r using integration by parts. 
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Example Uh8* Consider the heat equation 


a 

u 


fi 2 n 


~ — fix) . -1 < x < lj 


dx- 

(-l) = tt (1) = 0 1 


(10,38) 


where Q = a. Here 


Q) - a— 3 F(q) - -/(x) 5 B(u,Q) - u . G(Q) = 0 (10.39) 

and the domain is P = (—1,1). An appropriate space of test functions is V = 
and tin ? weak formulation is 



du dv 

"s** s 


f(x)v(x)d 2 \ 


which must, hold For all v E V\ Hence the operators A- and S are 


N(u, Q) = 




{ 10.41) 


S t o durst i c Weak F or mu lat ion 

For the random differential equation, we seek solutions u(.r. Q) E V®Z. where 
Z is defined in (10,23) and V is typically a Sobolev space. The weak stochastic 
mo< lei i hnmilation can be posed as follows: find he V ® Z, which satisfies 



N{u,q)S(v(x))z(q)p Q (q)dxdq 




F{q)v(x)z{q}p u (q)da;dq 


for all test functions v E V. z E Z. 

To appr ox i m ate the solution u (x , Q ) T we let ( ( x } } j } and { T J ’ & ( Q)] r . be 
bases for the spatial and random spaces and take 

V J = spanf 9 ,: } C V . Z K = spanj’Ejt} C Z. 

Typical choices for are splines, finite elements> or spectral functions, whereas 
we employ the spectral polynomials discussed in Section 10. 1 for Due to the 
product nature "f the domain V x T and space 1' 0 Z, we employ approximate 
sc : h itio i is of 1 1 1 e form 


K 

u k (x,Q ) - y^ti t {a:)^fc(g) 

k-fJ 
K J 

= 5252 (*)**(«)■ 

A-CJ j= l 
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Stochastic Galerkin Mtsthod 

In the Galerkin mel hod, we project the residuals. for the discrete problem onto 
the space of test functions to obtain 


K J 


rJD 


;V Y Y u jk 9 j ® k (q ) i q S ( 4 >t (*) ) ® t{ f i ) Pc ( f ?) 


k=oj=i 


( 10 . 42 ) 


bO 'T' i 


1’ -'P 


which holds for ( — 1, , . , , J and i — 0, , . , , K 

To de termine the coefficients Uj*. we must approximate the integrals. In the 
j>- d in i enaioi i a I param eter sp ace , we ei -\ i j >loy /? qi ieu irature j j o ii its r / 1 and weigi its u ; 1 
to obtain 


* f f K 3 

51 $ /_ * v j 5. Wjt & (*) ® fc (Y ), <f ) S(<t>£ fr))dx 

\ k— 0 j=l 


r— 1 


a? 


* / 
r=l 


(10.43) 


for £ = 1. . . . , J and i — 0 , . . . , .fiT. Standard Gaussian quadrature techniques can be 
employed to approximate the spatial integrals, as detailed in [95, 118, 225], This 
yields a ./(/if -L 1) x ./(/(" -f 1} system that is fully coupled in the physical and 
parameter spaces. 

Wc must also approximate the integral to evaluate the Qol (ID, 36). Whereas 
different quadrature rules can be employed, we simplify the discussion by employing 
the same quadrature points and weights. This yields 

R K J 

V { 3 -) = £ V> T p Q (q r ) 51 51 ( 10 . 44 ) 

r= i fc=oj=i 

which is easily evaluated once the coefficients u have been determined. 

For very low parameter dimensions — e.g p < 5 — one can employ tonsured 1-D 
Gaussian techniques. For low to moderate dimensionalities, the sparse grid tech- 
niques summarized in Section 11.1 provide a reasonable balance between accuracy 
and efficiency. 


Collocation 

To define the collocation system t we enforce 

j 

i*c*, o - v K (x,<n - 5i 

j=l 

at M collocation points by taking the basis functions 4 1 '.?. to be Lagrange polynomials 
which satisfy the collocation property Lt(q t,r ) = 5 km at the collocation points. This 
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yields the M relations 


N I ^ q m J S(6t;(x))d£ = j F(q m )4> £ (x)dx 


( 10 . 45 ) 


J=L 


for £ = 1 J. For each collocation point q i: ' „ solution for itf m requires the solution 

of a J x J system. This can be accomplished using existing codes or executable files 
-=m ! . I henc v U : i ■ ■ i . 1 1 : . : ■ . i ^ i ■■ ■ _ 1 he >clul ion ■ M si x \ \ stems is diTu'.iplcd nod highly 
parallelizable. 

We note that (10.45) can be obtained from the discretized Galerkin system 
(10.43) if one takes the collocation points q TT ' to be the quadrature points q r and 
employs the basis T * = Lf , . As detailed in Sections 11.1 and 1 1.2, we use the same 
sparse grid nodes for quadrature and when constructing interpolating polynomials 
for collocation. 

To evaluate the Qol (10,36). the choice of collocation points g m as quadrature 
points and use of Lagrange polynomials as basis functions yield 


n j 

y = 51 w'pqW) u 3^A x ) 

r-\ 3-1 

It 

= ^ W 1 ' PqW^Urix), 
r — 1 


( 10 . 46 ) 


where 

./ 

u T (x) = u m (x) = ^ Uj T d>j {x) 

j=i 


h as beei i pr e v ioi is ly coi nputed whei j solving (10.45). 


( 10 . 17 ) 


Discrete Projection 

For discrete projection, we employ the approximation 

1 * 

j Pi}(f) u ' r 

r=l 

to construct the generalized Fourier coefficients. This involves essentially the same 
computational effort as collocation since it requires the solution of R deterministic 
problems of size J, 


10.2.3 Evolutionary PDEs 

The extension of the stochastic Galerkin, collocation, and discrete projection meth- 
ods to an evolutionary PDE of tlic form 

-jjj — A' + jE(Q) , x E 'D, t E [0, oc), 

B(n,Q) = G{Q) , ie [0. oc). 

u(U,x,Q) - 1(Q ) 


t x € V, 
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can be achieved by combining the approaches detailed in Sections 10,2.1 and 10,2.2. 
Here Jsf again clenot.es a potentially nonlinear spatial differential operator, F is a 
source term. B and G are boundary operators, / specifies initial conditions, and D 
is a subset of or E'\ To simplify notation, we suppress the dependence of 

these operators on x and L The weak deterministic model formulation in 


—vdz -h / N{u,Q)S(v)dx= / F(Q)vd.^ 
Or Jn ./ 


which holds for v £ V. where V is a space of tost functions that satisfy essenti 
boundary conditions, The weak stochastic model formulation is 

f f du , , , , , . . f f , . 


■v r jp 


—v(x)z{q)p Q {q)dadq + j j N(u,q)S{v[x))z{q)p Q [q)dndq 


F(fj)n (. x)z{q)po iq )dxdq 


r jd 


for v £ V and z £ Z. where Z is defined in (10.23). 
The QgI is taken to be the expected value 


y{t, x) = j v,{l y X, q) p Q ((f) dq . 


To approximate u(i, j.\ q). we again construct finite-dimensional subspaces 
V J = c V and = span{^*} c Z : where the spatial basis functions 

<Pj{x) are splines, finite elements, or spectral functions and orthogonal polynomi- 
als are employed as a basis for the random component. The approximate 

solutions are then taken to be 

K J 

n R (t, ®, q) = z 52 

k=a 3=1 


Stochastic Calerkin Method 

The use of a quadrature rule with points q r and weights w r yields the Galerkin 
system 

R K J , . 

Z ^iWlpQW ’W Z Z —^kW) j tpj(x)4>e(x)dx 

r= I fc=.u jf=| 

K f / K J \ 

+ T W )«■' * / N I Z XI u i k i x )^k(<I r ) ) <1 T f £( 9t {x) ) dx 

T=1 \k-Uj=) / 


X! )p g (g r )w T / Fiq' )6( (xjdx, 


(10.48) 
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which holds for ( = 1 . . . , , J and i — 0, . . , . K. Once tlic coefficients have 

been obtained by integrating the J(K 4- 1) system. the approximated Qol is 

R K J 

?/(t, ;j;) = 'y " W F PqW 

■-1 ki=Uj = l 


Collocation 

The basis choice — Lk(q) which col locates at the quadrature points 

r/ J| q y yields the M R evolution equations 


dup 

dt 


+ / N 


T> 


S yuj T 4> i {x),q r J S(4>t(x))dx= I F(q r )<j>t(x)dx 
V= 


JT ? 


for / = 1, /. The Qol is approximated by 


R 

y(t, x) - V w r p Q (q r )u r (t,x) t 

r— ] 


where 


J 

j=l 


D i sc rote Projection 

For the discrete projection method, the generalized Fourier coefficients are 
approximated by the rep resen tat ion 

I 

Uk (t,x)= V u(t , X, <f)fy k W )pQ W)™ r ■ 

^ r=l 

10.2.4 Attributes of the Galerkin, Collocation, and Discrete 
Projection Methods 

St ochast ic Galerkin 

■ Because tlie Galerkin method is based on the projection of the residual onto 
the space of approximating polynomials, its accuracy is optimal in an L 2 sense. 
This can reduce the number of required computations. 

* For boundary value problems and PDFs in which the states are approximated 
by J elements, solution for Ujk requires the solution of a J(K + 1) x J(K H- 1) 
system that generally exhibits coupling between the spatial and parameter 
components. As illustrated in the examples of Section 10.3. the system can 
occasionally be decoupled for linear problems when the same polynomials 
are used to approximate parameters and states — e. g . , Hermits or Legendre 
polynomials are used to represent Gaussian or uniform random variables. 
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* Quadrature rules to approximate integrals over the ^dimensional parameter 
space are required at two points for implementation: evaluation of the inner 
products in the projection and evaluation of the QoL For low to moderate 
dimensions, this necessitates use of the sparse grid techniques summarized in 
Sect ion ILL 


The stochastic Galerkin method has three primary disadvantages. First,, it 
can be used only for densities with associated orthogonal polynomials. This 
precludes its use for general densities constructed using Bayesian techniques, 
Second, it relies on the assumption that parameters are mutually independent, 
which, ns discussed in Section 5.2, i- often not the case in applications. For 
applications where marginal distributions and correlation matrices are avail- 
able, this can often be addressed using a Nataf transformation. Finally, a new 
system associated with the projection must be developed and implemented. 
Hence the method is intrusive and existing codes typically cannot be directly 
used to construct the coefficients. For complex applications with large legacy 
codes or problems for which executable hies may be the only option, ihis 
disadvantage can be prohibitive. 

Convergence theory for the stochastic Galerkin method can he found in [J0, 
1 IS] and the references cited therein. 


Stochastic Collocation 

* Whereas the convergence analysis for collocation methods is based on interpo- 
lation theory for polynomials, implementation algorithms can be constructed 
by employing Lagrange polynomials as basis and test functions in the dis- 
cretized Galerkin framework. By choosing the quadrature points as colloca- 
tion points, one can decouple the stochastic and deterministic components of 
the problem. Despite this construction, however, we emphasize that colloca- 
tion is an interpolation technique as compared to the Galerkin and discrete 
projection techniques, which are projection methods. This is manifested by 
the fact that, the approximation space changes as the number of collocation 
points changes. 

* The met hod is nonintrusivc in the sense that once the- M cello cal ion points 
are specified, one solves M deterministic problems using existing software, 
including legacy codes or executable files. This decoupled solution technique 
is highly parallelizable and can be viewed as postprocessing. This constitutes 
a major advantage over the Galerkin method. 

* For boundary value problems and stationary P DBs, computation of the coef- 
ficients and construction of given by (10.-17) require the solution 

of M deterministic problem* of size J, This is in comparison to the solution 
of a single system of size ./(/<" T 1) for the Galerkin method. Once has 

been constructed, the Qol simply requires a vector multiply (10.46) rather 
than evaluation of the basis functions at quadrature points, as required for 
the Galerkin met hod. 
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* The const ruction of Lagrange polynomials and choice of collocation points, 
for low to moderate parameter dimensions* constitute the two critical aspects 
of the method. As discussed in Section 11.2, this is often addressed using 
sparse collocation techniques, 

* An important feature of the col location approach is the fact that it is appli- 

cable to general parameter distributions, including those constructed using 
the Bayesian model calibration techniques of Chapter 8, This is an advantage 
over the Galerkin and discrete projection methods where the orthogonality 
of the inner products depends on the compatibility between p_. (q) and the 
basis functions — e,g,, Ilennile and Legendre polynomials for normal 

and uniform densities. Whereas collocation does not explicitly require that 
parameters be mutually independent, the evaluation of the Ool requires sam- 
pling from the estimated joint density For correlated parameter sets, 

this requires that p Q (y j be constructed directly since it is not the product of 
the marginal densities in this case. Alternatively, if marginal densities and a 
correlation matrix arc available, one can employ a Nafaf trans format-lorn in 
1 he manner described in Sect ion 5.2, to formulate the problem in terms of mu- 
tually independent Gaussian random variables. For applications where this 
information is not available, one may need to employ the sampling methods 
detailed in Section 9.2 and illustrated in Example 9.14, 

* The accuracy of the method is dictated by the accuracy of the approximating 
polynomials. As detailed in Section 11 .2, the interpolation error for p param- 
eters and M collocation points is / — X^f — 0{M ' 1 ■ ' r ) , where Xju is the 
interpolation operator and a depends on regularity of the solution. Hence the 
accuracy degrades as the dimension increases. Comparison with the conver- 
gence rate 0(AI " for Monte Carlo methods illustrates that the latter is 
more efficient for largo p. The stochastic Galerkin method is generally more 
accurate than stochastic ool location but requires the intrusive construction of 
the discrete system. 

* Details regarding the theory and application of the stochastic collocation 
method to steady and evolutionary PDBs can be found in [18, 101 . 1ST 2G7|. 


Discrete Projection 

* The discrete projection method is also termed pseudospeotraK non intrusive 
PC. and nonintrusive spectral projection (NTSP) in the literature. It shares 
the two primary advantages of the oollocat ion method: it decouples the ran- 
dom and deterministic components of the problem, and it is nonintrusive. 
Once the ft. quadrature points are chosen, it requires ft solutions of the deter- 
ministic problem which is of size J for boundary value problems and stationary 
PDFs. This can be achieved using existing software and is highly parallelijs- 
able. 

* The method is equivalent to collocation if Lagrange polynomials are employed 
as basis functions, — L* . a] id the density and weight are unity. 
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* For implementation with Ilcrinite or Legendre polynomials, the assumption 
of mutually independent random variables is generally required to construct 
the density (q ) . As discussed in Section 5.2, however, parameters are of- 
ten correlated even for very simple models. This necessitates the use of a 
Nat.nf or Rosenblatt transformation to construct uncorrelated parameter sets 
or alternative techniques such as the sampling methods detailed in Section 9,2. 

* Dot ails regarding the construction of sparse grid quadrature techniques for 
low to moderate parameter dimensions are provided in Section 11.1, 

* Theory for the discrete projection method can he found in 18], 


10.3 Stochastic Galerkin Method — Examples 

111 this section, we illustrate aspects of the stochastic Galerkin method using very 
simple ODE examples in Section 10-3d and a 1-D boundary value problem in Sec- 
tion 10.3.2. In both cases, we illustrate improvements in efficiency that can be 
realized when the same basis functions arc used to r> -present \ he parameters and 
state solutions for linear problems. Whereas these examples illustrate computations 
in a setting where they can be done explicitly, they do not represent the complexity 
of applications where stochastic Galerkin methods are advantageous or required. 
We refer the reader to [148] for details regarding the applications of the method 
to a 2'D heat equation approximated by finite elements as well as flow problems 
including the Navier Stokes equations. An overview summarizing the application 
of stochastic Galerkin methods to flow in porous media, incompressible and com- 
pressible flows, reacting flows, and thermofliiid flows can be found in [ISO, 263] and 
the cited references. 


10.3.1 Scalar Initial Value Problem 


Example 1 0.9. To illustrate the stochastic Galerkin i\ 


lethod, we consider the initial 


value problem 


dt 

u{ 0,w) - ft 


(10.49) 


where is deterministic and fixed. We first, assume that, a ~ N(a 7 <r^) with a. > f). 
The Qol is the mean u(f) and variance var[u(f}], As illustrated in Example 10.4, 
the random parameter can be expressed exactly as 


jY 

Q = rt jV = ^ &n^n(Q) > Ckfl = a, oli = Q n = 0, n > I, (10.50) 

T1-— tl 


where t b n (Q) are Hermite polynomials defined on M. Here Q A r (G, 1) with density 
pc: (q) = |J ' 2 . The deterministic initial condition also 1ms an exact represen- 
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tation in the space of Ilcnnite polynomials since 


■V 


(i = fi N = V Pni>n(Q) , & = /S, A- ( = 0, T1 > 0. 




Finally, we note that the analytic random solution is 

u{t,Q) = 

We seek approximate solutions 


A' 


« K C'f|0) = 52 

fc=0 


subject to 


du 


K 


0 = ( — 1- u K . V ; i 

at 

K 


P 


K 


f du f 

= / Yl—^-{ t )^t t {q)Vi{q)p< i {q)dq I / or V V] u k {t)p k (q)i>i(q)p Q {q)dq 
jK t=o at M k = 0 

(10.51) 

for i = 0, 1, , K. Substitution of (10.50) yields the K 4- 1 differential equations 


dui 

~dt 


N K 


(10,52) 


f n=0 k =0 


specifying the state coefficients. For Hemiite basis functions, the expected values 


7i = E[^f(Q)]= f ?Pi(<l)pQ 

Jn 

P-in* - K [-0i (-Q ) (-Q ) V 3 fe ( O') ] - / 


(10.53) 


R 


have the explicit values 


H = i! s 


ilnlkl 


. 2s = i + n H- fc is even and s > i,n, fc. 


Cmft = \ (> - f)!(-v - n}!(fl - fc)3 

0 „ otherwise. 


Since 

A' AT 

«*((), a) - 52 = £ = 52 ftiiMG)* 

fc = rj n=0 

initial condil ions for ( 10.52) ate specified by 

u*(0) = ft , k = 0, A\ 


(10.54) 
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The linear coupled system of differentia] equations can be expressed as the 
vector syst en i 

ciitf 

— = Au{t)i 

dt (10.55) 

u(0) = 9, 

where u(t) — [uo{i), u K (t) F and 8 - fftj, ■ - - : Pk] T — [A 0, , . . , 0| r , The tridi- 

agonal matrix A = [A^] has components 


A-jfr — ^ 


Wiftk H" Vat-ilk 


so that 


A 


-« -Vu 

tTj;j Ct 


Kc r 


Cfc 


— IT 


u 


— a 


(10.56) 


The ODE system (10,55) can be numerically approximated using standard algo- 
rithms such as stiff solvers (e.g. 5 ode 1 8s . m) or Runge Kutta routines (e.g. , ode 4 5 . m) . 

To evaluate the QoL it follows from (10.9) and ( LO . 1.0) that once the coefficients 
have been computed, the mean and variance of v^(t v Q) are 


E Q)] = wo (J), 

h 

var[ti K (i, Q)| = ^ ^Whk 


(1 0.57) 


k=i 


For this example, exact mean and variance values 


^-[a + iT,, (f)t 

- 


«(*) - / 

J e V 2tf 


— q 2 / 2 


*7 


and 


varH — E [ti 2 (t)j - n J (t) 

5 


_ [ e %* a t 2 e -ait- 


can be computed using the analytic random solution, We first note that due to 
the nonlinear parameter dependence, nit) differs from the deterministic solution 
u(t : q} = evaluated at ct. Second* both u and var[u] exhibit unbounded 

growth for large t. This is due to the assumption that a N ( d, af t ) . Despite 
specifying that a. > 0, some realizations of ct will be negative, which produces 
exponential growth. It is shown in Example 10.11 tlmt I his is avoided if a 
W(0, a max ). 


■■■.J 
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Figure 10,1, True and approximate means and 2n credible intervals with (a) K = S 
and (b) A r = 10. 


The true means and 2cr credible inter vhIs.. computed using a - 1,0,, && = - 0.25, 
8 — 1D.0, & j — 2.0, are compared in Figure 10.1 with approximate values given by 


(10.57) with K = 8 and K = 16. This illustrates that despite the fact that the 
representation for o jV truncates at N = 2, a larger number of basis functions is 
required to quantify the exponential growth for large t. 


Example lu.lii . We now consider 

dli 

— = — a wm „ f > 0, 

dt w f (10,58) 

«( 0 , w) - 0 (oj) 

with stochastic decay parameter ft ~ A' (ft, o') and initial condition 3 ~ A r (3, u'i). 
We employ tensored I lor mite basis functions 

**(<?) = (Q 2 ) 

with the ordering summarized in Table 10,2, The state is approximated by 

K K 

U K {t,Q) = Yi Ufc(0**i(«l)** 3 (Q2) = 5^Ufc(t)®fc(0). 
k/| = 0 k = 0 

whereas the parameters have the exact representations 

jV 

Ot N = a ~ Ojit'niQ) i tto = dt, a i = oy*, o n = 0, n. > 1 
■n=0 

ft | (T j.'j. Q | j, 

•V 

d J,y = 8 — V~; ^ r4.{Q) , /8b — 0, 0i = 0, ih = <fp, )3 n — 0, n > 2 

n — 0 

- 3 + I 7gQ 2 

since ^o(O) = 1, $i(<Q) = <3i > and - Qi- 
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k 

k'j 

Multi- Index 

Polynomial 

0 

0 

(0.0) 

■»>0 (<3i )^o {Q-2 ) 

1 

1 

(1.0) 

VI (Oi)^o(Oa) 

2 


(0,1) 

V'c(<3i)^i (Qi) 

3 

2 

(2, 0) 

i> 3 (<h)Vo(Q?) 

4 


(1,1) 

Vi(Oi)vi(Qii) 

5 


(0,2) 

^0(01)^2(02) 


Table 10,2, Single index, multi-index t and tensored polynomials for p = 2, 


With i he multi-index nutation, i lie weak formulation of the model is identical 
to (10.31) but with integration over For example, the tensored normalization 
constants are 


■ i 


Jffi 2 




/ ^(?iK( 9 i )dqi / *Pi,(<&)p <}2 (92)^2 
■h t •!& 


Dt 

= 7t= 


In a similar manner, the 1-D relations can also he used to evaluate the expected 
values E (g)® n '(g)\Pk'(g)]. However, the matrix A cannot easily be constructed 
using tensored 1-1) matrices of the form (10.36) due to the ordering of basis elements 
and definition in terms of the total < legree P. This differs from certain finite element 
or spectral expansions that admit mass matrix representations- A4 = A-/ © M, whore 
M is the 1-D mass matrix. To illustrate, we note that the matrix ,4 for P — 2 
(A" - 3) is 



—a 

-Vet 

0 

0 

0 

0 


— a 

G 

— 2cf„ 

0 

0 

0 

0 

a 

0 

-&a 

0 

0 


G 

—a 

0 

0 

0 

0 

~ *- T ii- 

0 

— a 

G 

0 

0 

0 

0 

0 

— a 


Hence one would typically employ exact Gauss Hermits representations for this 
case rather than relying on tensored analytic I I) relations. 

The vector differentia] equation is again (10.55) with 3 = [/?, 0, erg. 0, . . . , Dp 
and u(f) = . . . , vk (i)] J ' „ Once the time-dependent coefficients have been 

determined, the mean and variance of are given by (10.57), 

I br the random solution 
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the true mean and variance are 

1 


u(t) = — / / (j9 + o-fiq2)e (**+** <li) e e ( d/^ € *if 2 dqidq2 


2rr 


R -/ R 


var(u) — IE £?)] — i u J (t) 


= c 


-2 At 


c 2 ^ (;? 2 -b - d 2 c 


2 .. — rT " I ' 


We again note the unbounded growth of u{t) and var(w) due to the fact, that the 
normal density admits negative realizations of ct. Tliis can be contrasted with 
a M(0, cr m ax} considered in Example 10.11. 

The true and approximate means an<l 2 rr- credible bands for d — Id), a^ 
0.25, 3 = 10.0, — 2.0, and K — 0 basis functions are plotted in Figure 10.2. The 

2a $ = 4 interval at t — 0 reflects the variability in the initial condition. These results 
can be compared with those in Figure 10.1 for the single random variable or, The 
slight discrepancy between the true and approximate means and credible intervals 
is due to the limited number of basis functions. The convergence of the method is 
illi ist rat :ed hi E xerc > e 1 0 - 3 , 



Figure 10-2. Tmr. ond (Ljrp-fvwm&te THtiuns and. 2a cfvdihl^ ?ntt j nxikt obltiinvd with 
K = 6. 


Example 10.11. To provide bounded means and variances, we consider (10.5-8) 
with the assumption that a — £/(|, 4) and 3 U( 7. 13) are uniformly distributed 
with means a 1, 3 10.0 and variances rr^ 3. It follows from 

Example 10,5 and Exercise 4.4 that the parameters can be expressed as 

a = 1 + ~Qi , 3 = 10 I 3 Q 2t , 
where Q L , Q’i i ? Y{- 1,1). The analytic solution is thus 

v{l,Q) = (10 + 3Q a )e -(1 ' , '^ l)t . 


(10.59) 


232 


Chapter 10. Stochastic Spectral Methods 



Figure 10.3. 1'rnr mean and 2a credible interval- for the solution (10. SO) for um 
for ml jr/ distributed parameters a and 0. 


It is established in Exercise 4.4 that the mean and variance of u are 


20 

u(t) - = — slnh(t/2)e 


-t 


and 

vai[w(t)] — ^ — — sinh(t) — ^sinh 2 (<t/2)^ e~ 2t . 

The mean and 2a credible interval are plotted in Figure 10.3. In contrast to the 
normally distributed parameter a, which admitted negative realisations, the real- 
i -■ = 1 j. \: 7 .. ill 1 li-: ■ Mr.ifurM'.lv : li-1 i 1 1 m: ■■ 1 n are guaranteed 1 : ■ be nnunegiH i vu. I [cnee a 
and a[u decay for large t rather than exhibiting the exponential growth observed 
in Example 10. 10. 

Legendre polynomials now constitute the natural choice of basis functions. 
The construction of u K (t^Q) using the stochastic Galerkm method is addressed in 
Exercise 10.2. 


10.3,2 Boundary Value Problem 

E xa i n p I e 1 1 hi 2 « Co nsi der 1 1 ic heat equal ion 



u(-l) = u(l) - [\ 


where a ~ A’ (a, ff„). It was show n in Elxample 10.0 that 


(10.60) 


i 

a = a N - & I (7 -„Q - V 

v+ ~ |"J 

where 'L^f Q) 1 and 4 r | (Q) = Q are the first two Mermite polynomials. Since 
Q N(0, 1). the density is p^{q) = e ~ Following Example 10.8, the 
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operators and 5(f) are 


, r , , , du 

3) = (Ot + <To,Q) — 

and the stochastic weak Formulation is 


dv 

■ 



'l 


flu dv 


(« 4 <r a q} 

je;_i dxdx 


{<l)pQ (q)dxdq 



fv{x)z(q)p Q (q)dxdq , 


!R J-i 


which must hold for all v e //,/; (—1, 1) and z e 

To construct k ; . we consider a uniform partition of the interval [—1,1] with 
points Xj = —1 + jhj j = 0, , , , , J and uniform stepsizo h — 4 . The spatial basis is 


% ( x 


X — Xj- 1 ? Xj _ I < X < Xj. 

J Xj < X < J 

j otherwise. 


1 

= — Sj-i-i - a- s Tj < i’ < a:j f ] . 


0 


where j = 1> . . . f J — 1 to enforce the essential boundary conditions. The spatial 
subspace is then V ,v = spanj^j ) J ; l . The random space is = span{^jt}^ 0 , 
where ^(r/) are the Ilennite polynomials, and the approximate solution is 

k .. r -i 

u h (x,Q) = 52 52 k{Q}- 

t=0 j=l 


mem 


J-t K r 

(* + **) EE 

j=i t-o Lj-i 





,J (x)4>t(x)dx 


ne lj-i 




which holds for £ = 1. r . ,,J and i = t. , . . , A\ With the definitions 


= / <p> f Ax)<f/t(x)dx = 


2 : J = t 

1 . j = £ 1 or j = ( + l f 


J - 1 


0 , 


TVVISO, 


A = / fix}<pt{x)dx, 

the problem can be expressed as 
j i k .. 

Z*vZ u jk I (n f iT a q)Wk(tl)y !.(<}) Po(<})d(t = ft j ^i{q)pQ(q)dq 

j= 1 A— ti 


J: 
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Due to the orthogonality of the I lor mi to polynomials, wo note that 


$ k{<!)'i'i(q)pc,Ui)dq = klSki- 


R 


Furthermore, (10,53) and (10.54) with n — k — 0 yield 

/ (?)/'<? - | 


R 


1 . £ - 0, 

0 . oilier wise. 


whereas the same expression with i-1 and '4, k variable yields explicit values for 

eifci - j k(q)pQ(<l)d<}‘ 

Jr 

This facilitates the construction of the (J — 1) x (K + l) X (J-l) x (A' H-l) system 
req l i i rr:c ] to sol ve f< ir 1.1 ie t '.i jet He i en i s Hj & , 


10.4 Discrete Projection Method — Example 


Example 10,13* We revisit Example 9.7. where wc illustrated the propagation of 
uncertain parameters Q = [m, c, fe] through the spring model 


d 2 z dz 

m -|- c — ■ 4- kz /() cos 


it: 2 


dt 


dz 


(o) - ?a , — (o) - 2i 


(10.61) 


using the Monte Carlo sampling methods of Section 9.2 and perturbation methods 
of Sect ion 9.3. We consider the response 


:v(*f, i3) = 


( 10 . 62 ) 


\f (& — -mwj..) 2 + («Wjr) J 
and wg asaums that Q ~ iV(<j, V) with mean q = [2.7.0.21,8,5] and covariance 


matrix V given by (9. 19), The Qol are the mean and standard deviation of t.he 
model response for driving frequencies wp E [0,2.7]_ In this example, we illustrate 
the propagation of uncertainties using the discrete projection method. 

The parameters are represented as 


m = m^o(Q) I fftn'Ei (Q) — m f 

c - c®o (0) + ^2(9) = c 4- 

fc = k ' VoiQ ) + ct a ^ 3 (q) = v | (t fcQ ai 


where = s (Qi)^ft a (Qs)^fc a (C3) are tensored Hermit.e polynomials with the 

ordering given in Table 10. T The approximated response is 

K 

y K (wf, q) = Y: Vk (“>) f * (Q)* 
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where the generalized Fourier coefficients are 


Vk&F) — / 
Ik Jw 


( 10 . 63 ) 


The density is 



5 


and -yit : : 7fci7fca7.fck- -A- 3 illustrated in (10,54), the individual expected values are 
7 * l = fcj. ! . Once ybi^p) has been computed, it follows from Property 10,1 that 


p(wr) — tfoi^p), 

K 

var [g* (uy, Q)\ = 

fc=l 


( 10 . 64 ) 


The Qol follow directly from (10.64 ). 

The tensor product relation (11.7) can be used to approximate (10.63) since 
p — 3 is small . This yields the approximate relation 


Vki^p) 


1 

Ik 


f % 

JZ E E ^F,$ r )^*(/)A3(</ r )« | {i 


■ ,=| r% = 1 T>; =1 


where = [gf 1 , ] and irj = luj^ 1 . The quadrature points and weights 

can be specified using the Gauss- Hermite rule on the infinite domain or Gauss- 
Legendre, trapezoid, or Clenshaw-Curtls rules oil a finite truncated or mapped 
domain. 


The Qol given by (10,64) with K ■ - 9 and R ■— ID" 1 tensor ed Gauss- Her mite 
quadrature points are compared in Figure 10.4 with the Monte Carlo mean and stan- 
dard deviation (9.20) obtained with M = 1U !: realizations of the response 
It is observed that the two techniques yield essentially identical Qol. We also note 
that the means y(ujp ) given by the discrete projection or sampling-based methods 
differ from the deterministic solution y(uJfM/) evaluated at q for driving frequencies 
l jp near the natural frequency ^ 1,7743 Hz. This is due to the nonlinear de- 

pendence of the solution on the parameters, Finally, we note that the total number 
R = 10 ,J of tensored quadrature points was chosen to simplify implementation and 
smaller numbers of points will yield comparable accuracy. The approximation of 
the Qol using sparse grid quadrature techniques is investigated in Exercise 11, 1. 


10.5 Stochastic Polynomial Packages 

The status of comprehensive packages or toolboxes for stochastic polynomial meth- 
ods is presently limited for two reasons’ their use for uncertainty quantification 
is fairly recent, and they are intrinsically linked to the codes employed in under- 
lying applications. There are a number of MATLAB codes being posted by re- 
searchers, and basic MATLAB toolboxes will become increasingly available as the- 
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OJ 

3 

(U 

Hi 

c: 

U 

M 1 ! 

a: 

LL 




(a) (b) 

Figure 10.4. (a) Response, mean# and (b) stundar'd deviations computed using the 
disri'&t# proj&c tion and M on t& (Jar la sampling nif: thrtds and drier mini fitic hi. tin n 

v(wf, q) ■ 


field matures. It is anticipated that these toolboxes will provide an important tool 
for education and research based on MAT LAB application codes. For large-scale 
applications, the constraint of interfacing with user-supplied production codes is 
more stringent and dictates that toolboxes must incorporate both numerical algo- 
rithms and efficient interface mechanisms. One example is the Sandia National 
Laboratories toolbox DAKOTA (Design Analysis Kit for Optimization and Tern- 
scale Applications) [4 ? 5. 71]. This C4 — I- toolkit facilitates the use of stochas- 
tic polynomial methods for uncertainty quantification and related objectives such 
as optimization, sensitivity analysis, model calibration, and reduced-order model 
development — by providing a flexible and extendable interface between simulation 
codes anti linear algebra, optimization, quadrature, and interpolation algorithms. 
The incorporation of stochastic polynomial methods in other large-scale commer- 
cial, government laboratory, and open source packages will surely grow as the field 
matures. 


10.6 Exercises 

Exercise 10*1* Consider the random differential equation 

du 

— = — ? t > 0, 
at 

■ti(0, Ld) — dfoj'j. 


(10,65) 


where a and ft are uniformly distributed random variables with means a, 8 > 0 and 
v:l rim k c : s n-. rrt. 1 rtnti [ I . I'M. i'; folk ms 1 lml Tic si n:has1 i- ■ s :•! 1 1 1 ion is 

»(«,«) - (8 + \/3 
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where Qi . Q 2 ~M(— 1, 1)- Show that the mean and variance of u are 


-sinh(v ,, '3<T Q 7) _- d 

d 7= e 

v3*r tt f 


an< i 


r /.ai /52 , 2 sillt'L (2 v' 3iT iV t ) ftitlh —2&t 

'"[”<'>] = <f> ' '») “5^7 — 15 .vy * 


Exercise 10.2. Consider the differentia] equation (10.65) with ft ^ and 

3 ^ M{7. 13) as assumed in Example 10.11. Use the stochastic Galerkin method to 
compute the approximate solution w jV (i T £) using N = 0 tensored Legendre basis 
functions. Plot the true and approximate means and 2er credible ban els on the time 
interval 0,5]. 

Exercise 10.3. Compute the approximate mean and 2<r credible bands for the 
random differential equation of Example 10.10 using a stochastic Galerkin expansion 
with N — 10 tensored basis elements. Compare your solution with the true and 
approximate values, obtained with A r — 0. that are plotted in f igure 10.2. 


Exercise 10,4, Repeat Exercise 10.2 using Uie s Loch astie collocation method with 
various values of A-/. What value of M is required to achieve convergence? 


Exercise 10.5. In Example 10.13+ we illustrated the use of the discrete projection 
method to propagate the mean and standard deviation associated with the spring 
model (10.61) and response (10.02). Repeat this example using the stochastic col- 
location method. How do your results compare with those obtained using discrete 
projection and Monte Carlo sampling? 





Chapter 11 

Sparse Grid Quadrature 
and Interpolation 
Techniques 


In this chapter, we discuss the tensor product and sparse grid quadrature and in- 
terpolation techniques required to implement the stochastic spectral methods of 
Chapter 10. This discussion necessarily focuses solely on techniques for these meth- 
ods, and more general theory is cited at relevant points in the discussion. 

11.1 Quadrature Techniques 

The Galerkin, collocation, and discrete projection methods all require the approxi- 
mation of integrals over the domain F c JR**. For the Galerkin method, this occurs 
when projecting the residual onto the space spamied by the orthogonal polynomials 
(10,26), (10,42), and (10,48) and when approximating the Qol 

//( 1 , *) E [« K {t^Q)\ = j v. K it, x, q), p Q {q ) dq . 

Discrete projection necessitates the approximation of 

■«k( t, x) = — (u, $(■} = — / «(t, ®, q) 1 ® k{q)pQ{q)dq, (ML) 

7 A- 7Jt Jr 

where t> — (4^, S*) along with the Qol 

For I arge pi » ra 1 1 icter d i 11 ici Jisio us p, si. r j d 1 ; istie ■ 1 u a drati 1 re t <x ;1 m i q \ ies are oj. ?t i- 
ma) and we summarize those i:i Section 1 1 .IT. In Section 11.1.2. we summarize 
deterministic L-D and tensored quadrature relations. These are applicable for low 
dimensions and serve as a basis for constructing the sparse grid techniques discussed 
in Section lid. 3, 

11.1.1 Stochastic Quadrature Methods 

The Monte Carlo method is the simplest stochastic quadrature technique. In 
this method, a (pseudo-) random number generator is used to sample realizations 
q T = [<?[i ,,,,#£] from the joint density p Q (q) constructed either experimentally or 
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using the statistical techniques "f Chapters 7 aiul 8. For mutually independenl 
components Q i} one can independently sample from the marginal densities 
For R samples , the inner product in (11.1) is approximated by 

l R 

= — 22^(i, x > ( f)'® k{<f)Pw{q r ) I £ft.> 

r— I 

where the error satisfies E[c,r] = 0 and £r = f° r large R. The advantage of 

the method is the fact that the convergence rate is independent of p and does not 
depei id on the smo i >thi less o f u (t , x , Q } $ (q). For this reason , stoch ast i c rnetho ds 
are advantageous when p is moderate to large. As detailed in Section 11.1.3, the 
range of p where stochastic methods become advantageous depends both on p and 
the regularity of u. The low convergence rate comprises the primary disadvantage 
of the method , As noted in Section f)-2. this convergence rate dictates that. > lit 
number of simulations must be increased by a factor of I CIO to gain one additional 
place of accuracy. 

Improved convergence rates can be achieved by using more efficient stochast ic 
sampling techniques, such as Latin hypercube sampling [170] or quasi-Monte Carlo 
sampling 174]. For example, quasi Monte Carlo techniques have a convergence 
rate of O ( In R : p / ft) - T lu we ver ■ tl icsc t cchi liques arc still i i,o t cu m pet it-i ve for 1 ow to 
moderate dimensions where tensor ed or sparse grid techniques are advantageous. 


11.1.2 Deterministic Quadrature Methods: 1-D and Tensor 
Product Formulas 

The Galcrkin, collocation, and discrete projection methods require the approxima- 
tion of integrals 

l {p) f = / Ib)p^)dq, 

J r 

F C K J,J S by sums 

R 

Q {p] f - 

r=l 

where the choice of quadrature points q r and weights w r defines the implementation 
algorithm and accuracy of the method. We consider first 1-D quadrature techniques 
since they provide the basis for quadrature in multiple dimensions. 

1-D ( J uadr at lire He 1 at ic j ns 

The- inti ^gratiou and quadrature relations in one dimension arc denoted by 

/ fl) / = f ~ ^f(q r )w T = Q 0) f, 

' T= 1 

where T] Cl. 

restrictions associated with the assumption of mutually independent parameters and 
representations for the joint density are discussed in Section o,2 and Remark 9,4, 
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G {iV'Sxift.n Quadrature. Techniques 

As detailed in 72. 221 237, 240|. Gaussian quadrature techniques constructed 
using orthogonal polynomials with densities p Q {q) yield optimal accuracy for broad 
classes of functions. Because these polynomials are precisely the Hcrmite polyno- 
mials used to represent normal densities on (— oo,oo) and Legendre polynomials 
used for uniform densities on 1, 1], the resulting Gaussian quadrature techniques 
constitute a natural choice for the Galerkiu and discrete collocation settings. For 
uniform and normal densities* this yields the Gauss-Legend re anti Gauss- Hcrmite 


relations 


i { 1 ) f=\jm d q 


5 51 fitf'Wi 


f me-'ff'dq « ]T 

VJtt./r t=1 

where the nodes and weights for the Gauss- Legendre rule are summarized in Ta- 
ble 11,1, As noted in Example 10.2, most tables specify the points and weights for 
Gauss-Herniite quadrature based on the “physicist 11 Hcrmite polynomials which 
are orthogonal with respect to the density ^(r?) = e q . These reported nodes and 
weights can be transformed for the density p Q {q) = -4==e" r '' ” the relation 

(10,16). 

Gauss- Legendre quadrature relations integrate polynomials of order 2/? — 1 
exactly. However* Table 11.1 illustrates that nodes are not nested in the sense 
that the set of nodes at level R contains those at level R — 1. This motivates the 
construction of nested quadrature techniques. 


Nested Quadrature Techniques 

We consider nested nodal structures where the number of nodes or quadrature 
points at each level is denoted by and and denote the points and weights 


r 

Nodes q T 

Weights ur" 

1 

0 

2 

2 

-n 

-H 

1 

3 

0 

n 


4 - 1. 1,1,1 - 



V 5 

Ci 

A 

\/l w^fzo 

49 

4 


6(l£~V3rj) 



4 !> 


V™ 



Table 11,1- Nodes and weights for Gauss- Legendre quadrature on [—1. 1], 
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at cad i level, The 1-D quadrature rule is thus 


si- l> / = £/(<£>»*’ 


Re 


r= 1 


To simplify the discussion* we consider the case of a uniform density on the 
interval |U. 1 . For a random variable q defined on (ci.fr) with the density p,.-.{q\. 
oik; can map the parameter space to the cdf, which is defined on [0,1], using the 
hi jeer ive i napping 


= -^(s) = f Pq( C)oiC e [o.i 

J i -2 


(1J.2) 


for q € (a.h). For f(q) > one then has the transformed integral and quadrature 
relat ions 


r ( 1 ) f= [ f{F~ 1 (x))dx 
•h 


and 


R 


gt/ *£/(<£>< 


r=l 


where gj = .F _1 (a;J) and is the weight on [0, 1J for the t' th -level. Without loss of 
generality, we thus consider / to he posed on [0, 1 with uniform measure. 

The simplest nested quadrature rule is the composite trapezoid formula 


,CD 


h 


O v - f - — 

.} — n 


Ri 2 


m + /( i) + 2 y; /fen 


r = l 


(11.3) 


where 


hi- = 


Re = 2* 1 + I , = rht = for /■ = (),, , , , F*, ( 11.4) 


■y-i 


ot 


The weights are thus ( , hf 7 ■ • ■ , 4fJ, 

The nodes for levels £ — 1 tihrough £ ■ ■ 5 are shown in Figure 1 1.1(a) to 
illustrate the nested structure. Whereas the use of equally spaced nodes simplifies 
the grid construction, it is illustrated in Example 11,0 that it can produce spurious 
oscillations when employed for interpolation. This is avoided by the Clenshaw- 
Curtis and Fejer rules, which we summarize next. 

The Clenshaw-Curtis and Fejer nodes are the extrema of Chebyshev polyno- 
mials which are typically defined on the interval [—1.1]. When mapped to |0 n l , 


the Clenshaw-Curtis nodes are given by 


Qf — 2 


1 — c- :s 


*('•- 1) 
lit- 1 


. r = 1, . . . , Re, 


(11.5) 


where Hi = 1 and Rf = 2* 1 + 1 for £ > 1. Ah illustrated in Fi gure 1 1.1(b), the 
resulting nodes arc unequally spaced and cluster at the endpoints of the interval. 
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(a) (b) 

Figure 11.1. Quadrature points for levels f — 1 — 5 for (a) the trapezoid formula 
(11.3) and (b ) Clensha w- Cu rti s poi nts (11.5). 


The boundary nodes are neglected in the Fejer rule, which is advantageous if map- 
pill! ; y.mduiii \ i v n : 1 :d i. ; r- v-.idi mibtumdetl domain,^. u.«., normal random variables. 
Details regarding the construction of nodes and weights at a given level are pro- 
vided in [148] - We refer the reader to [249] for an overview of denshaw— Curtis and 

ITi id : 1 1 l; id it. 1 1 la- r i Lies - ill' I i id a: Is dluM r : 1 1 i _ i . . 1 1 .nr ; - r !'■ i ■ i : i ■ i : : ■ - ■ li| i.ur; I o ■ Cuuss 
quadrature. 

The 1-D trapezoid and Clenshaw-Curtis rules both sal isfy the error bound 




out;") 


(ii.o) 


for / E <7 ft [Q, 1], This forms the basis for the tensor product and sparse grid error 
formulas. 


T eri-.s o 7 r Produ c t Vfnrn u la, ti on 

We now address the approximation of integrals 

j ( p )/= f / {<i)Pq (q)dq 

on the ^dimensional hypercube T = [0, l] p . If we let denote the quadrature 

rule for the integration direction, then a tensor product rule is defined by 



Rf p (11.7) 

= 53 ■ ■ 53 f M ri > ■■■ > ^ r ") w '’i ■ ■ ■ w Z ■ 

ri=l r p — 1 

The total number of quadrature points is 

p 

n ■ = n R e< 

i= 1 
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Figure 11,2. I'emor product of Clenshaw-Cut'tis points for p = 2 and four levels 
oft i and 1 2 ■ 


or 


It = (Rf) p 


if the same number of points is used in each direction, as illustrated for teusored 
Clenshaw-Curtis grids in Figure 11.2 for p - 2. Thin exponential growth in the 
number of required nodes precludes the use of tensored quadrature relations for all 
but low-dimensional problems e.g., p ranging from 5 to 8. 

The curse of dimensionality is also manifested as a significant reduction in the 
convergence rate of error bounds as the dimension grows. For R = ( Re ) r quadrature 
points, the error for the p-dimensional Clenshaw-Curtis quadrature rule satisfies 


I b:,) f - Q [ pf = 0(R; a/p ) ( 11 . 8 ) 

for functions / in the space 

C CV ([0,1] P ) = 1/ : [0 t 1.] p R 

with bounded derivatives up to order a. Here k 1 = (fci, . . , , k P ) E N p is a multi- 
index with | Id | Y* hi. The exponential growth in the required number of 
quadrature points and the diminished convergence rate motivates the use of sparse 
grid quadrature techniques for problems with moderate dimensionality. 

11.1.3 Sparse Grid Construction 


: i i ! i x 

| ic| < a 


y'O/ 


V 1 - ' • 9qp 


< oc 


(1 1 .<>) 


x 


To motivate ideas underlying sparse grid construction, we consider products of 
monomials t as illustrated in Figure 11.3. The accuracy of quadrature rules is typi- 
cally quantified in terms of the degree R that can be integrated exactly. To specify 
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R 

0 I 

1 x y 

2 x 2 xy y~ 

3 x 3 x 2 y xy 2 y* 

4 x x ;y x y xy y 

Figure 11.3. Product# of TttOUUTni&ls up to deg rw 4, 


monomials of degree four, one need only consider the five terms listed at R = 4. 
This is in contrast to a tensor product which will have 25 terms that include mono- 
mials such as x^y that, are higher accuracy than necessary. The goal with spame grid 
methods, which were originally proposed by Smoly&k in [226], is to construct grids 
and weights that yield the same accuracy as tensor products bur with a significantly 
reduced number of required points. 

There are two formulations for sparse grids that differ slightly in the hierarchy 
of employed spaces. We describe the form detailed in m which is formulated in 
terms of the levels t. and summarize the second formulation in Remark 11.3. In that 
remark, we note that the two typically yield the same grids and weights and hence 
have the same accuracy and efficiency. 

To construct a sparse grid quadrature rule, we start with the 1-D relation 


Q^f = E /(?* K- 

T= 1 


( 11 . 10 ) 


where are the nodes and weights lor the ( i ' i nested level, and define the 

( 1 i fier ence r el at ions 

4 1} f = (q‘ 1} - Qg?i) / 

with 0.$/ = 0. We note that A^f is also a quadrature formula in which the 

nodes are the same as those for Q- 1 '/ and the weights are the difference between 
those for the £ and f. — 1 levels. The 1-D set of nodal points is denoted by 

0^ = (ii.il) 


Example 11.1, Consider the composite trapezoid formula defined by (1 J .3) and 
(11.4). For f = 2, the points and weights are = {0, 1} and [j, j], whereas 

they are ©j 1 ^ = (t), 1} and [-, ■-] for £ = 1. 1 hus 


4 1) / = -l/C0) + l/(i/2)-i/(i). 


Heucs the nodes 

for Q^K but the weights 


(0,^,1} for the difference rule are the same as those 
[~b reflect the difference - o\ ]) . We note 
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that unlike Newton-Cotes and Gauss quadrature formulas, negative weights are 
permissible for sparse grid quadrature relations. 


The sparse quadrature formula at. level £ is then given by 


Q { tf = E 

I n<Hy I 



(11-12) 


where t? — (£i, ,( p ) € is a multi-index with \(*\ = £*. This is in contrast 

to the tensor product formulation (1L7 ) 1 which can be expressed as 



max 


(11:13) 


where max / = max{ ( \ , £ p } , The notation f \ \ and \t is often used to express 

the limits used hi (11.12) and (11.13). These formulas can both be formulated as 

Q?f = E (11-14) 

tf'EiJ(f) 


where 11 (f) is a multi- index set that is a function of the level £. I ’or the sparse and 
full grid formulas (11.12) and (11.13), the respective multi-index sets are 


p 


: It? £ N p 5 ^ £i < £ + p - 1 




and 


The nodal set for 


1 (f) = [ t t 

sparse grid is 


(i < t i - 1 , p) 


<** = U 

|r <£+ p _i 


x © 




(11.15) 


Example 11 * 2 , Consider the Clenshaw Curtis rule with nodes specified, by (11.5) 

on the interval 0, l. so that t-V ,'' 1 — {-} and 0.^ — {0, 1}. For p — 2, the 

multi- index is t — (f i , £ 3 ), so |G| — £i T f?- For £ = 4, the sparse grid nodal set is 

<->i 2) = (®i l! X ©F) (4?| - 1, is - I) 

u (©y> x ©y*) u (V“>y ; x Gy) 

u (©y* x ©y*) u (©y } x ©y } ) u (©y j x ©y>) 

u fo'; i) x e', i} ) u (©y 1 x ©y) u (©y x ©yh u (©y - 1 x ©y 1 ) , 


which utilises the indices depicted in Figure 11.4(a). Combination of the points 
shown in Figure 11.2 using if us index pattern yields the sparse Glei ishaiv-C urtis 
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X 
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Figure 1 1,4* (a) Indices for the union of the 1 -D grids x ©^ and (b) the 
sparse Clenshaw Curtis grid resulting from the application of these formulas to the 
points m Figure 11.2. (c) / n dices, for the tensor product fQTvnulatum (11.13) and. 
(d) the full tensor product grid. 


grid shown in Figure 1 1 -4(b). The indices for the full tensor product formulation 
(11.13) are depicted in Figure 11.4(c), which yields the full tensor product grid 
shown in Figure 11.4(d). We note that for lids case, the sparse grid has 29 points, 
whereas there are 9 2 — 81 points in the full grid. 


Remark 11.3. The sparse grid formula (11.12) and nodal set are often expressed 
using equivalent formulations or formulations based on slightly different hierarchies 
of spaces. An equivalent formulation for the sparse grid quadrature formula (11.12), 

based on Q, ' rather than , is 


£<|£'|<£+p- I 


/'-I 

id-* 


cl 11 ® 


® e} 1J 

p 


f. 


Alternatively, many authors define the sparse quadrature rule as 

A(q,p)f= V 

k'l<<J 


(11.16) 


(11.17) 


248 


Chapter 11. Sparse Grid Quadrature and Interpolation Techniques 


or, equivalently, 

Mw)f = Y, (- L )* _|k1 ■ ( /_ |k'i ) ■ (sff ® ‘ ■ ■ ® fiff) f ( n ‘ 18 ) 

(j— jtj+i< kr|<^ 

for integers ^ > />. The sparse grid is defined as 

m,p)= U x ■ ■ • x e£- (1L19) 

k' - 1 < q 


We first note that if ij = £ + p — I . then the two formulations yield the same 
nodes and have the same accuracy. In this case, the formulation (11.17) (11.19) 
utilizes fewer low-level hierarchies and, at first glance, would appear to be more 
efficient to implement. As detailed in [93], however, one typically omits previously 
employed points in the nested 1-D relations which will render the two formulations 
equally expensive to implement,. It is established in [47] that the relation ]^| < 
£ Tp — 1 optimizes a cost-benefit ratio which motivates the formulations (11.12), 
(11-11). and (1 1 16) based on the number of levels £. Either formulation can tie used 
for the Galerkin, collocation, and discrete projection methods used for uncertainty 
propagat ion . 


We compile in Table 11.2 the number of points in full and sparse grids for 
various levels and dimensions. This illustrates the immense computational savings 
that can be realized with sparse grids and demonstrates how they significantly 
increase the dimensionality of integrals that can be accurately approximated. 

Furthermore, it is shown in |185] that if we let "R denote the number of sparse 
grid nodes used b\ ,4 (<j, p) - then the quadrature error satisfies 

||f<» f - A(fj,p)J II = a (11.20) 


V 

R ( 

Sparse Grid 7?. 

Tensored Grid R - (R e ) p 

2 

5 

13 

25 


9 

29 

81 

5 

5 

61 

3125 


9 

241 

59,049 

10 

5 

221 

9.765,625 


9 

1 581 

> 3 x 10 9 

50 

5 

5101 

> 8 x io a4 


9 

171,901 

> 5 X lU 117 

100 

5 

20,201 

> 7 X lG m 


9 

1 ,353,801 

> 2 x I.0 95 


Table 11.2. Number of 8-parse grid find t&rrJtored Clen&haw Curtis quadrature points 
with Ri: = 5 and 9 nodes in each direction for dimensions p = 2 f.o 100. 
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for / E C' v ([0, l] p ) defined in (11,9), Comparison with the tensor product error 
(1 1 .8) illustrates that the reduced number of sparse grid nodes 7Z versus R = (R.^Y 
helps push back the onset of the curse of dimensionality. However, for sufficiently 
large p, the term 7Z~ a ceases to dominate and the convergence rate is no longer 
competitive with the rate Q(R 1 ■' ^ ) exhibited by Monte Carlo methods. For cer- 
tain problems, the efficiency of sparse grid quadrature techniques can be extended 
by adaptive grid construction! which accommodates anisotropic behavior often as- 
sociated with various components r/,. of the parameter vector. 


Adaptive Sparse Grids 

In the quadrature formula (11.14), the multi- index set. L(f) defines the struc- 
ture of the grid. To allow grid adapt at ion, one can modify 1(f) to weight various 
dimensions in accordance with their influence — c.g, , the influence of random vari- 
ables in Karhunen Loeve expansions for certain random fields diminishes as the 
index increases. To construct an anisotropic grid with variable accuracy for certain 
random variables, one can employ the multi-index set 


{ f e 


^ - « = ^2 Oifi < t + p - 1 


( 11 - 21 ) 


i=i 


where a € M J , is a vector of weights, A sparse grid created using a weighting vector 
that varies the influence of i\ is illustrated in Figure 11,5, 

This strategy has two limitations: it is often difficult to prescribe a based on a 
priori knowledge of the model, and the flexibility" of the method to yield, general grid 
coarsening strategies is limited by the prespecified structure of the multi- index set 3. 
These issues hcc nddressecl by strategies that provide adaptation through sequential 
construction of the multi-index set. We refer the reader to [148] for details and 
analysis regarding more general adaptation strategies. 


Remark 11.4. 13 le adaptation discussed here pertains to anisotropies in the influ- 
ence of various random parameters. This differs from ft and p adaptation used to 
refine the accuracy of finite element or Galerkin approximations. 
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Figure 11*5+ (a) Indices and (b) the. resulting adaptive sparse grid 
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Remark 11-5- The tensor product and sparse grid quadrature techn i ques discussed 
here are distinct from cubature rules, which are specifically optimized for multidi- 
mensional integration and hence are not based on combinations of 1 -D quadrature 
rules [239]. Further attributes of cubature rules in the context of stochastic spectral 
methods are discussed in 4]. 

11.2 Interpolating Polynomials for Collocation 

The collocation techniques described in Section 10.2 rely on the construction of 
interpolating polynomials Lk that satisfy the property Ljt(£ m ) = at the points 
q m r Although the collocation algorithms were constructed in the context ol Galerkin 
methods, the convergence theory relies on properties of the interpolating polyno- 
mials and the choice of collocation points rather than the theory of L 2 projections. 
We note that the choice of points is fundamental to interpolation theory and poor 
choices can significantly degrade the method’s accuracy. 

The objective for interpolation can be broadly summarised as follows. We 
assume that we have a process w(g) that is a function of p inputs q — q i, , , , , q p . In 
the case of experiments, the true process is typically unknown and we simply have 

a set of M measurements or realizations = n(r/ l,v )> m = 1 , M, corresponding 

to M values of the input vector. For mathematical models 1 the construction of 
solutions u(tf) is often computationally expensive, so we wish to infer the behavior 
of a for various parameter values q based on a computationally tractable i lumber 
M of computed solutions In both cases, we seek polynomials E IP m-i 

of degree M that satisfy 

- U m = ti(ri m ) ■ m = 1 , A/, (1 1 .22) 

at the specified set of interpolation points and accurately approximate u(g) for re- 
maining parameter values q E 1 ‘. Since the determination of values u m = | cj r,, ‘ ) 

is often computationally or experimentally expensive, we seek polynomials and in- 
fer i x >lni ioi i points that opt in l ize t he a ocur at ;y o f tl le ap pr oxi ma tici i u ^ {q) ps u(q ) , 
where tin i dimensi o nal ity p of 5 is o ften re I at i ve] y 1 arge . with the smallest possi b le 
number of interpolation points. 

In Section 11,2.1, wc describe the construction of 1 -D Lagrange polynomi- 
als and indicate one choice for specifying nonuniform interpolation points. We 
summarize tensor and sparse grid techniques for p- dimension al interpolation in Sec- 
tions 11,2.2 and 11.2.3, 

11 . 2.1 1 -D Interpolation 

Wc consider M \ pairs where u 1/y — u(q Tty ) are solutions computed with 

realizations q m e R 1 , Throughout this discussion, we assume that the points <j m 
are distinct. To approximate v { q ) , we seek polynomials w AJl (r/) G JP.v/, that satisfy 

v M '(<j m ) = v m , m = 1 Ml. 


(11.23) 
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Such polynomials can he uniquely specific 1 


u M ^(q) = V u m L m (q). 


m = 1 


where L m (^) are Lagrange interpolating polynomials defined by 


(11.24) 


L m (q) = j j 


.5 = 0 

jfm 


<i - !7 J 
q m - rjJ 


(v ~ q l ) ■ ■ • '// ~ ~ g'" +1 ) ■ ■ • (g ~ g ;1Jl ) 

(‘7™ - g ') ■ • ■ ( q m - q m ~ l ){q m - y™- 1 } ■ • ■ (q m - q M ') 

for rn — L \/| . By construction, the Lagrange polynomials satisfy 


(1 1 .25) 


L m {q n ) - S mn , 1 < m f n < M l . (1 1.26) 

which ensures that u ,W| (g m ) = u"' for all points. This approach lias the advantage 
that representations accommodating new points q can be constructed by siin- 
p I y ap| >et ] d i ng ten i is to (11, 25) . We < lenote the l - D h iterp ol ai i i i g poly nor n ial b ased 
on M ] distinct points by 

X W u{q) = u M, {q). (11.27) 

The construction of Mi — 1 degree polynomials that interpolate for M\ pairs 
only requires that the interpolation points be distinct. However, the next example 
illustrates that the accuracy of the interpolating polynomial can be highly dependent 
on the el mice of interpolation points. 


Example 11.6, Consider the Rtuige function 

1 

1 + 25 ^/- 

on the interval [—1, 1], This classical example ilhLstrat.es spurious oscillations, which 
are one of the difficulties associated with interpolation using high degree polynomi- 
als. This is often termed Hungers phenomenon, and it can be especially pronounced 
when using uniformly distributed points; for example, 

q ! = - \ I {j - 1)77- . j = 1, + 1, 

Mi 

as illustrated in Figure 11.6(a) and (c). Details regarding the theoretical basks for 
this phenomenon can be found in [70, 24 S . 

The use of nonuiiifoniily spaced collocation points that cluster near the ends 
of the interval can help to mitigate this problem. The abscissae 
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(a) 



(b) 



(c) (d) 

Figure 11.fi. Interpolation utiinfj M i 11 and M\ 2 o uniformly spaced points 

(a), (c) and Chebyshcv points (b) r (d), 


of the Chebyshev polynomials constitute one choice that accom polishes this go al . 
J-' i L.m ■ l L -6(b) and (d) LlluMvi.h ■ i liar wbiT-Mi:-. -i olh 1 oscidatit ihtiu' udnu . i / 11 

Chebyshev points, the interpolation using M\ = 25 points is devoid of oscillations . 

An alternative strategy, to improve conditioning and improve spurious oscil- 
lations* even using uniform grid:.-;, Is to employ lower-order or piecewise polynomials 
(splines). The improved stability often outweighs the corresponding reduction in 
accuracy. 

The 1-D interpolation error satisfies the bound 

II U - z ( 1 >u|| „ < Em , 1 (u) ■ ( 1 + Am, ), (1 1 -29) 

where E;,/, i and Am, respectively denote the approximation error and Lebesguc 
constant. It is shown in 29] that 

A Ml < - log (Mj - 1) + 1 , Mi > 2, 

.* . 


for the nodes (11 .28). This yields the bound 

for u E C a \— 1, 1], Tills is analogous t-o the 1-D quadrature bound (11,6). 


(I 1 .30) 
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11.2.2 Multidimensional Interpolation Using Tensor Products 


To construct interpolation formulas for parameters Q ; , , . , wo employ 

tensor products of the 1-D interpolating polynomials and collocation points. This 
is analogous to the approach used in Section 11.1.2 for mult ldimensional quadrature, 
and additional details regarding the general approach can be found in that section. 
We again assume that the mapping (11.2) can be used to transform parameter 
components defined oil (a, t) to the interval [G ? 1], and we consider interpolation oil 
the hyper cube [0. l] p . 

In a manner analogous to (11.5). we define the interpolation points at the i th 
nested level to he 



cos 


7T (m — 1 ) 
M f - 1 


m = 1 , , . . , 


(11.31) 


where iW L — 1 and M$ — 2 f 1 +1, t' > !, for the Clenshaw-Gurt is points. The 
choice (11.31) yields nested points, which facilitates implementation and minimizes 
oscillations in the manner illustrated in Example 11.6, The 1-D grid of interpolation 
points is denoted by 

e? )= {fle 1 ’ (11-32) 
The multidimensional, tensor product, interpolation grid is 




may i\ r <£ 


X - ■ ■ X 0 



(11.33) 


which has 

M = (Mt) T (11.34) 

nodes if the same number of points are used in each direction. The exponential 
growth in the number of interpolation points is the Jirst manifestation of the curse 
of dimensionality. 

A tensor product of the 1-D relations (11.27) yields the p-dimensional inter- 
polation formula 



■■■ 




(11.35) 

s E 

E «(«T l . L mp (q p ) 

1 

m i — 1 

1 



which requires that the model be evaluated at the M — j j ■ Mt interpolation 
points. This will be prohibitive for all but low dimensions — e.g,., this requires p < 5 
to 8. 

Ft >r A / ( M di st i net col location poi nt s . the 1 1 iterpol at ion e i ror s at isfies 

\\u - (1 1 . 36 ) 


for u £ C' 1 ([0, 1] J ') defined, in (11.9). We note that this is an algebraic rather 
than exponential convergence rate. Derivations of these bounds in the context of 
stochastic collocation arc provided in [181, 
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11.2.3 Sparse Grid Interpolation 


The development of sparse grid interpolation techniques is analogous to the sparse 
grid quadrature techniques summarised in Section 11,1.3, and sparse grids con- 
structed using Clenshaw-Curtis, Fejer. or Gauss points can serve as either quadra- 
ture or interpolation points. 

We Jet 

M, 

z] y) u = 52 

JT / 


denote the 1-D operator that interpolates at the Mi points ©j/ = , . . . , } in 

the £ 1 ^ nested level n where, for specificity, wo define q™ to be zeros of the Chebyshev 
polynomials ( 11.32). We also define the difference relations 


A. 


U) 

r. 


u = 





which are also interpolation formulas. The sparse grid interpolation formula at level 
£ : is then 

\£* <f+p-l 

This is equivalent to the formulation 


ipu = 52 

t<\p;\<?.+p-i 


P- I 


I) 1 ' 0.--©2| IJ 


) 




The nodal set for the sparse grid is 


0 


(p) 

£ 



o 

£ 


^ <£-W- i 


X ■ - - X 0 


(1) 


as compared with the full tensor product interpolation grid (11.33), Fhrther details 
regarding the full and sparse grid constructs can be found in Section 11.1,3. Adap- 
tive sparse grid interpolation techniques can be constructed in a maimer analogous 
to that outlined for quadrature. 

As noted in Remark 11.3, many authors define the sparse interpolation as 
(11,16) or ( 1 1 . 1 7) a nd 1 lie sparse grid as ( 1 1.10). For implcmcntat ion of the stochas- 
lic collocation method, the two formulations are essentially equivalent. 


11.3 Sparse Grid Software 

The Sparse Grid Interpolation Toolbox provides MAT LA 1 5 software for initial inves- 
tigation of sparse grid quadrature and interpolation routines. We caution the reader 
that the "Clcnshaw Curtis” points designated in this toolbox arc equally spaced and 
are actually Newton Cotes points. To specify the more standard Cknshaw Curtis 
nodes (11.5), one must use the designation “Chebyshev.” Sparse grid capabilities 
are also available in the Saudi a National Laboratories toolbox DAKOTA [4, 5_ 71 , 
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11.4 Exercises 

Exercise 11.1. Consider the spring : mo lei 

d 2 zdz . 

m— r I c— -V kz = focos^’^t), 
dt^ at 

z{Q) = *, , ^(0) = Zi 
and steady state response 


vi'^F, Q) - —- 


V' f^ — raij jr'y 2 f (r^Jfr)^ 
whore Q = soe Examples 0.7 and 10,13, The Qol are 1 lio moan and 


standard deviation for driving frequencies uj/r e [0,2.7 . Assume that Q ■ jV (q, V / ) 
with 7 = 2,7,0.24.8.3 and 1 given by (9,19), Use tlio sparse grid techniques of 
Section 11. L 3 to approximate the integrals in (10,93), Compare the resulting Qol 
given by (10.64) with those given by (0.20) with M 10 :=J Monte Carlo samples. 
How does the number R of sparse quadrature points compare with the R = 10 3 
tonsorod points used in Example 10,13? 





Chapter 12 

Prediction in the Presence 
of Model Discrepancy 


Essentially, all models are mmng, but some are useful, George E.l\ Box 


In Section 7.1, wo considered statistical models of the form 


— jih-Ql I hffj;) - r .^ 
A'i — j \ ■ ; - (}) + + £* 


for evolution and stationary processes with model responses f(ti , q) or f(xi , g) . Here 
T* is a random variable whoso realizations u* are measurements from an experiment. 
Model errors or discrepancies are represented by the terms 5(tf) or which do 
not depend on the physical parameters q , and £* denotes measurement errors. For 
the discussion in Chapters 7 and 8, we combined the model and measurement errors- 
into a single random variable c,; which we assumed was iidj and, for some analysis, 
we required £,■ ™ jV(U t (7"). In this chapter , we discuss issues that arise when the 
assumption of iid combined errors is rendered invalid by structured, correlated, or 
biased model discrepancy terms <5(f*) or £(£,-), Whereas there are statistical and 
mathematical techniques to quantify discrepancies for certain problems, general 
methods are lacking, due in part to the problem-dependent, nature of the issue. 
The development of robust, techniques to quantity model errors for broad classes of 


problems constitutes an active research area. 

The next three examples illustrate differing types of model discrepancy. The 
ramifications of unaccommodated model errors are discussed in Section 12,1. In 
Sections 12 , 2 and 12.3, we discuss techniques to quantify S(U) or S(xi). and we 
indicate unresolved research issues and future research directions in Section 12.4. 


E x ai n p le 12*1* Coi isidcr 1 1 to mo t lei 

d 2 T s 2 (a + b) h 


dx 2 
dT , 


ab k ^ 'amb 1 1 


£ 

* <"> - ¥ 


dT, 

17 


(L) = j'\T ajnb - T„[L) 1 


( 1.2. L ) 


DC, - ? 
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As detailed in Example 3.5, this model quantifies the steady slate temperature of 
an uninsulated rod with source heat flux <4> at x = 0 and ambient air temperature 
T am ty. The thermal conductivity k for the aluminum and copper rock employed in 
experiments is well documented, so we treat it as known. The parameters are taken 
to be q = BTy /j ], where h is the convective heat transfer coefficient. 

In Example 7.16, we employed non linear least squares to estimate the param- 
eters using the data compiled in Table 3,2 from the aluminum rod. The residuals 
in Figure 7.3(b) exhibit no discernible pattern, thus motivating the hypothesis that 
the combined errors zj are iid. 

A similar least squares fit to the copper rod data, in Table 3.3 yields the 
parameter estimates # ■ —9.93 and h = - 0.00143 along with the fit- and residuals 
plotted in Figure 12.1. The residuals indicate that the combined errors in this case 
are clearly not iid due to unaccommodated model errors 5(^). We illustrate in 
S . 'l : : i : i .2.2 1 1 1 1 mI- it. i*. 1 u: s', ai i r 1 ical 1 ivhidciucrw ■ i 1 1 1 ' ■ i.r-i/n i ■ * :M i:Y lli:-- 
form of model discrepancy. 




Figure 12 * 1 , ( a) M odel fi t and (b ) resi duals jo r the copper rod . 


Example 12,2- In Examples 7.16 and 12,1, we respectively obtained the parameter 
estimates ff? — —18.41, h — 0.00191 and <1> — —0.03, h = 0,00143 using data from 
aluminum and copper rods. However, <£ and h are material independent., and any 
material d--] -rii-: l- luv sii-add In- a- ■ immodatod bv the ihcrmai coni lui.-livit v vain. -. 
k = 2.37- ' r and k = 4.01 J~ l for aliuninum and copper. 

To test the validity of the model to characterize the temperature distribu- 
tion for both materials using one parameter set , we simultaneously perform a least 
squares lit to both datasets. The parameter estimates $ = —13,75 and h = 0.00166 
yield the model fits shown in Figure 12.2. Because the errors produce opposite bi- 
ases j algebraic or statistical models for 8{xi) will be ineffective in this case. 

To illustrate the ramification of unaccommodated discrepancies 5(xj) on the 
predictive capabilities of the model, we employ the solution with parameters ob- 
tained in Example 7.16 through a fit. to the aluminum data to predict the temper- 
ature distribution for the copper rod- Specifically, we compare the model solution 
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Figure 12,2, M od el fit for the (a) a turn inu m an d ( b ) co ppe r rod using th e ,s imul- 
taneously optimized parameters = 13.75, h = Q. 00166. 


obtained with - - —18,41, h — 0 .00191 and the copper conductivity value k — 4.01 
with the copper data in h’igiire 12.3. Due to the model discrepancy, the prediction 
is highly inaccurate. We illustrate in Example 12-4 that this must be addressed by 
incorporating physics that is neglected in the model (12.1). 



(a) (b) 


Figure 12.3. (a) Model fit to the aluminum data and (la) prediction for the copper 
rod, 


Example 12.3. Here we consider the experimental data and model discussed in 
Example 3.7 for a thin cantilever beam driven by a voltage spike applied to surface- 
mounted piezoelectric patches. Data consists of temporal measurements collected 
using a proximity sensor at x = 128 mm. For a beam of length L, a weak model for- 
mulation for the transverse displacement s w(t, x) is provided by the Eulcr-Bcrnoulli 
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equation 




cl(x) 


fYw 

Ox 2 dt. 


<trdx 


whitrli holds for all test, functions <p £ V - { D t /.■) | = = ^(0) = = 0}. The 

density, stiffness* and damping relations 

p(x) = phb | pphpbpXp[&) , YI(x) = YI + Y p I P x P {x), 
cl(x) = cl + CpIpXpi &) 


reflect the differing geometry and material properties in the region covered by the 
patch. The material parameters and constants are defined in Example 3.7. The 
reported density 2700 kg/m 3 for aluminum yields pb = phb = 0.03775, which is 


fixed to ensure that the remaining parameters are identifiable. The parameter set is 

7 n i i Y I py c i f.i . dp-\ kp ] ■ 

where p p = p p k p b p , YI P = Y p I tn Yh = YI, CI P = C p I p , and Ch = C\h. The 
model response is the displacement y(li*q) = w(ti,x,q) evaluated at the point 
x — 128 mm. 


Parameter distributions were constructed using the delayed rejection adaptive 
Metropolis (DRAM) algorithm* discussed in Section 8.0, applied to the first second 
of data. The optimized fit oil the time interval |G, 1J and prediction for [1* 3.573 j are 
ou m pa rod with the data in Figure 12.4(a), and the frequencies for the entire time 
interval are plotted in Figure I2,4(b) r The residuals and Q-Q plot of the residuals 
are plotted in Figure 12.5. 

It is observed that even though the model is accurately quantifying the be- 
havior of the device, the residuals are heteroscedastic and exhibit clearly discernible 1 



Figure 12,4, Model fit for t e [0,1] and predictions for t £ [1,3.572 in the (a) 
time and (b) frequency domains, 
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(a) 

F i gu r e 12.5. ( a) R esi duals an d ( b ) Q - Q plot o f th e residuals fo r 0, 3.5 72 j . 

periodic behavior. The non- Gaussian behavior of the residuals is quantified by the 
Q-Q plot. In combination, this illustrates that despite the accurate model fit, the 
combined measurement and model errors are neither Gaussian nor lid. 

To statistically quantify the predictive capabilities of the model, we employ 
the techniques of Chapter U to construct prediction intervals which are plotted in 



Figure 12,6. For the time interval [0,25,0,35], which is included in the fitting inter- 
val [0, 1], the prediction interval overlaps the correct percentage of the data. In the 
truly predictive region [3, 3,572], the prediction interval includes essentially none of 
the data due to the incorrect modeling of errors, This is due to unaccommodated 


discrepancy terms and the resulting non- i id behavior of the combined error 
S(U) \~Ei. 


12.1 Effects of Unaccommodated Model Discrepancy 

Depending on the magnitude and nature of model errors or discrepancy terms 
or 5{ar*), their neglect can negatively impact model calibration and prediction in 
four ways. 

* It can diminish the validity of estimates q for physical parameters since all 
parameters are treated as tuning parameters whose optimized values may 
attempt to compensate for missing physics. 

* Highly correlated or h eteroficed&st io errors invalidate hypotheses associated 
with the employed likelihoods, which can diminish the accuracy of sampling 
distributions or estimated parameter distributions. 

* As illustrated in Example 12.3, the neglect of model discrepancies can produce 
inaccurate prediction intervals or intervals that are significantly larger than 
those constructed for models that incorporate S{U) in a statistically consistent 
manner. 

* Unaccommodated model discrepancy terms can yield highly inaccurate ex- 
trap olatorv predictions when inputs other than those used for calibration are 
employed in the model. This is illustrated in Example 12,2. 
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(a) 



(b) (c) 


F igu r« 12.fi. Prediction intervals and data for f.ht: Urn . rv interval* ( a) [0,3.572], 
(b) [0.25,0.35], and (c) [3,3.572], 


In combination, neglect of model errors can negatively impact all three nmi 
ponents of predictive estimation, as defined in Chapter 1. It limits the accuracy of 
both parameter estimates and their uncertainties during model calibration as well 
as the accuracy of predictions and prediction intervals. The resulting inaccuracy 
can be especially pronounced when predictions require extrapolation from the cal- 
ibration regime, as is often the case for time-dependent problems. Finally, it can 
shrink validation regimes, which further limits the model's predictive capabilities. 
We illustrate two techniques to quantify model discrepancy terms £(:c*) or 
8(ti ): incorporation of unmodeled physical or biological properties, and statisti- 
cal or algebraic techniques to quantify model errors. The first technique yields 
superior extrapolatory predictive capabilities and may be the only option, but it 
is problem-dependent and often very difficult to achieve. The second approach is 
more general but typically yields less accurate extrapolatory predictions unless one 
imposes stringent prior information for both the physical parameters and model 
d fscrcj j anc ;y tori ns . 
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12.2 Incorporation of Missing Physical Mechanisms 

Example 12.4. In Exam pi ft 12.2, we illustrated that algebraic or statistical tech- 
niques will be ineffective for quantifying the discrepancy C(j:,r) associated with the 
model {12.1) when considering rods of different materials. This necessitates re- 
formulating the model to incorporate the neglected physics which produces the 
conflicting biases illustrated in Figure 12.2. 

We employ three physical mechanisms when constructing the model (12.1) 
detailed in Example 3.5: conservation of energy, Newton's law of cooling along the 
exposed surface, and flux inputs at the source. The first two reflect well-founded 
physical principles, so we focus oil the source boundary condition. 

Flux balance analogous to Newton’s law yields the Robin boundary condition 

JTT7 

(0) - f)T a (0) = ~7fTf.au, ( 12.2) 

where rj, with units of ? is a second heat transfer coefficient, and T$ ource is a 
source term. We note t hat the original source condition 




is an approximation of (12.2) if one takes — —r)T $0/f}ri and neglects the term 
??T*(U). The solution to the modified model 


d 2 T, 2 (* | h) h 


at 


^ n- 0*0 ^ rr >rh h ? 


dx 2 

§ro-|T.ro— 




(12-3) 




is 

T e (x,q) = ci(q)e~ lX + c 3 (q)e rt + T,-, mh , 


where 


/ 2(g-b)h 

y ,'p t'i; 


and 


(12.4) 


ci (<?) = 


v( 1"), 7X1 b T'sfJTt !' i’G ) 


r' ,L (/f ) £7) 


Tf — k')' [c - "- ^ — ■fc'y) -i- + £7) J 


/ \ ’Oi^'sourcc Tu pji.fr) / m- 

c 2 (q)= ; I 

1} - &7 


Here T — j-j- * -j and the parameter is taken to bo </ = [T Mm . e , ft-, vy], We again 


suppress the parameter dependence of 7 and I 1 to clarify the notation. 


We repeat Example 12.2 with the new model ( 12.3). A least squares fit to the 
two data sets using the thermal conductivity values k = 2.37 and k = 4.01 yields 
the parameter estimates T tf0tit . ee = —49. OS °C 1 h — 0.00172, q = — 0.0S41 and the 
Jits shown in Figure 12.7. Comparison with the fits shown in Figure 12.2 for the 
original model (12.1) shows a marked improvement in the accuracy. 
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Figure 12.7. Fit of the model (12.3) for the (a) aluminum and (b) copper mds „ 


To illustrate the predictive capability of the extended model, we again estimate 
q i'i^otircei ij] using the aluminum data with k — 2.37 and use the calibrated 
model to predict the copper temperature distribution by employing the copper 
thermal conductivity value /.■ I.DI. The results in Figure 12.8 illustrate that the 


prediction is now reasonable as compared with the original prediction illustrated 
in Figure 12,3. One can ascertain the uncertainty in this prediction by simulating 
with thermal conductivity values sampled from the ranges 2.04-2,50 and 3,53— 
4.01 . : reported for aluminum and copper. 



Figure 12.Sh (a) Fit of the model (12,31 to the aluminum data and (b) prediction 
for the cop] tea md. 


Remark 12,5, A solution that is commonly proposed in the literature, to address 
model discrepancies, is to incorporate physical mechanisms missing in the original 
model. In the problem illustrated in Examples 12.2 and 12.4, this was essentially 
the only solution and, when this approach is possible, it generally yields prediction 
intervals that are tighter than those obtained using phenomenological algebraic or 
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still istical relations to quantify 8{xi) or S(ti). For Example 12.3, one could employ 
the more comprehensive Timoshenko model with loss mechanisms incorporated in 
the boundary conditions at x U. However, this will significantly complicate 
implementation, and there is no guarantee that the new model will have more 
physical parameters and yield iid residuals with reduced prediction intervals . More 
generally, the inclusion of missing physics is often not a reasonable solution since 
the physics would have been initially incorporated if physically or computationally 
feasible. This motivates the consideration of algebraic or statistical techniques to 
quantify model discrepancies ='? (ity) or 


12.3 Techniques to Quantify Model Errors 

Xu simplify the discussion n we focus on the spatial statistical model 

T'/ — £/) T E'i , X G IK-^j 

and note that time-dependent problems or problems with x E ~X 2 or R 3 can be 
addressed in an analogous manner. 


12-3-1 Polynomial Models 

For simplicity! we consider the quadratic discrepancy representation 

5(xi) = ,do H- ,8iXi H- 8 2 xf (12.5) 

and note that higher-order polynomial representations can he constructed in a sim- 
ilar manner. Here fti, &u and da are unknown hyper parameters which we esti- 
mate, along with the model parameters > using the frequentist or Bayesian tech- 
niques detailed in Chapters 7 and 8, The augmented parameter set is thus • 
[q, /io, 3 1 , ;h | > and 1 as detailed in Section 12.4* the i dent ifi ability of the augmented 
parameter set and balance between f(x ^ q) and 5(xi) constitute critical issues when 
( \\ i ant i fy mg moc Lei di sc 1 1 -pai u :y i : i this m ai j t jer . 1 ■' in al I y, rea< lers are re fer red to Sec- 
tion 13.1.1, and particularly (13.9)^ for analogous construction of quadratic response 
surface representations employed as surrogate models. 

Example 12,6, In Example 12.1, we showed that the original heat model (12.1) 
accurately fit the copper rod data compiled in Table 3,3 but yielded the correlated 
residuals shown in Figure 12 . 1 . Hence the combined errors 4- e* are not iid. 
Here we show that this can be addressed by modeling the discrepancy terms using 
the quadratic representation (12,5). The complete model is thus 

Tj = f(xi, q ) + fio + $i x-i I fax? + Si, (12.6) 

where f(xt,q) — T s (x i} q) in given by (3.22) and q aus — A least 

squares fit to the copper data yields the parameter estimates q aU g - [—9.49, G. 00135. 

5.67 x 1G _ ~\ 2.94 x i0~ 3 , - 2.37 x 1Q~ 4 ] and the fit and residuals shown in Fig- 
ure 12-9. The residuals no longer exhibit n discernible pattern, thus motivating the 
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Figure 12,9. (a) Fit of the combined model f(xi,q) -I- 5 (x,; ) and (b) residuals for 
the copper rod . 


assumption that the measurement errors are lid. Furthermore, the estimates for $ 
and h are close to the values # = —9.93 and h — (LOG 143 obtained in Example 12.1, 
which indicates that the physical model dominates the algebraic discrepancy term 


s(®0- 

The fit of the model (12.6), where f(x ^ q) ■- T${x^ g) is the solution to ex- 
tended heat model (12.3), is explored in Exercise 12.1. The construction of densities 
for using the Bayesian techniques of Chapter 8, is investigated in Exercise 12.3. 


12.3.2 Kriging or Gaussian Process Models 

Polynomial discrepancy representations are intuitive and direct to implement, but 
they do not provide mechanisms to incorporate known or assumed distributions for 
5(a^). In a manner analogous to that described in Section 13.1.1 for surrogate model 
construction, kriging or Gaussian process representations provide this capability. 
For model evaluations at n data points x — \x \ , . , , , x n ], the model discrepancy 

term 

n 2 J)) = [£(*!,)«, a 2 , 0),... 0)j 

is assumed to be from a multivariate normal distribution, 

6 ™ N{ftl. o' 2 R), ( 12 . 7 ) 


where I = [1, . . . , 1] is a unit n - vector and R is a symmetric, positive definite matrix 
with elements 

Rij = oxp(-tf|cKj - (12.8) 

We note that the correlation function (12.8) is the univariate version of the q - variate 
representation (13.13), illustrated in Figure 13,4, that is employed for surrogate 
model construction.. The hyperparameters a. p* and 0 are combined with the phys- 
ical parameters q to form the augmented parameter set q auq = [q t c, 9\ to be 
estimated. For spatial processes! this if often termed a kriging representation in 
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reference to its geophysical origin. The manner in which an interpolatory kriging or 
Gaussian process mode] provides uncertainty bounds is illustrated in Figure 13.3. 
The use of a Gaussian process mo< lei to quantify the model discrepancy term dis- 
cussed in Example 12.1 is pursued in Exercise 12.2. 

Whereas this approach is commonly considered due to its generality and sta- 
tistical attributes, it often exhibits limitations in applications. Representative issues 
are discussed in the next section. 

12.4 Issues Pertaining to Model Discrepancy 
Representations 

The phenomenological nature of the polynomial and kriging, or Gaussian process, 
model discrepancy representations (12.5) and (12.7) lias the advantage that its use 
requires no knowledge of the underlying physical or biological process quantified 
by the model /(#*, t/), Hence these representations can he employed for a fairly 
broad range of applications. The kriging or Gaussian process representations have 
the additional advantage that they provide a natural mechanism for incorporating 
prior knowledge about the process ? provided that if is Gaussian, However, the fact 
that the representations are phenomenological also imbues them with inherent lim- 
itations, which we discuss in this section. This also renders the topic of quantifying 
model discrepancy terms for general processes an open and active research area. 

12.4.1 Parameter I dent if lability 

In Chapter 6 : we discussed parameter selection techniques to isolate identifiable or 
influential subspaces of model parameters or inputs. We used these techniques to 
reduce the number of parameters to those which could he uniquely determined from 
responses as required for OL3 algorithms. We noted in Section 8.5 that Bayesian 
techniques could, in some cases, be used to construct densities for nonidentifiable 
parameters if one has sufficiently informative priors. 

These observations also apply to the augmented parameter set 
where q^u = [ft], Bu : 'h] or <idi$ = fl] in the model 

Ti = f(xi, q) - S(xi,qdn) \ 

Tin' addition of [.ho discrepancy terms '/*,,) can yield ail unidentifiable aug- 
merited parameter set. tfaug fr }r problems where r; is identifiable. Fn some eases, this 
can be addressed using one of the following techniques. 

* Employ sufficiently informative priors for the parameters and discrepancy 
fund. ion to permit Bayesian inference of q a u^- In some cases, this can be 
achieved by restricting the admissible parameter space based on a priori in- 
formation. For parameters with little to no prior information, this may not 
be feasible. 

* Iteratively solve for 5 and 5^ 3 while keeping one parameter set fixed. This can 
lie employed if q is identifiable but q atltf is not. This approach is sub optimal 
and may not yield a global minimum. 
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12.4.2 Confounding of Physical and Phenomenological 
Components during Extrapolation 

in Examples 12.2 and 12.3, we illustrated model predictions that require extra] >q 1&- 
tion beyond the calibration domain. In the first example, this involved predictions 
usin g different physical coefficients, whereas in the second, it involved predictions 
in time. For these examples, unaccommodated model discrepancy terms yielded 
inaccurate predictions in the first case and inaccurate predictions intervals in the 
secern 3 . 

For 1 lie stat istical model 


'T ( f , ■)'/) r 4- ^ i 

it is the physical or biological model /(aq, q) which propagates the in formation re- 
quired for extrapolatory predictions. The role of the phenomenological discrepancy 
function g^is) is to ensure that physical parameters q are estimated in a statis- 
tically consistent manner so that predictions and prediction intervals are accurate. 
Because this term is phenomenological, it has essentially no predictive capabilities 
outside the calibration domain unless stringent prior information is imposed on the 
admissible space of functions used to construct There arc a number of 

difficulties that arise when attempting to specify a class of functions for thus) 
with physically reasonable prior information. 

* We illustrated in Example 12.2 that no class of functions could be used to 
construct S(x^ g^) for different materials due to opposing biases in the resid- 
uals, We demonstrated in Example 12,4 that additional physics had to he 
incorporated to improve predictions for materials not used during model cal- 
ibration. 

* If insufficient prior structure is imposed on during model calibration, 

the hyperparameters can be chosen so that overly compensates for 

physical or biological mechanisms. The combined model f(xi^q) + 

can provide accurate predictions in the calibration domain but have minimal 
predictive capability outside this domain . 

To illustrate, consider the steady state heat model (12.3) with the non- 
physical value 'fj = 0 . From (12,4), this yields T s [x^ q) = T am b, so the statis- 
tical model is 


Ti — + /3ft + fiyXi -h + fii&i + £* 


(12.9) 


if one employs a quartic discrepancy relation, A least squares fit to deter- 
mine the hyperparameters, using the aluminum data in Table 3.2. yields the 
accurate fit illustrated in Figure 12.10. However, the model has no ability 
to predict the heal distribution for the copper rod. To avoid this- scenario, 
one must impose prior information on the representation for that 

prohibits the choice q = (L 

# The Gaussian process representation (12.7) will be ineffective for quantifying 
the in o del i liscrep aricy ten n S ( , q t a s ) for the i i mo-d ope mdt s i it bean i e< {nation 
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Figure 12,1(3. Fit of the pheno menological model (12.9) to the aluminum data , 


in Example 12.3. The quantification of qdia) for extrapolatory predictions 
of time- dependent processes constitutes an active research area. 

* Gaussian process representations will not be effective for highly non- Gaussian 
random processes. The development of statistical representations for such 
problems is also an active research area.. 


12.5 Notes and References 


The analysis of model discrepancy as a source of uncertainty in predictive simu- 
lation models was initiated by Kennedy and OTlagan [1 34] ^ who referred to it as 


model inadequacy. Statistical models that incorporate model discrepancy terms 
have been employed in applications that include environment [13], hydrology [205], 
climate [218., 234 1 , and engineering [12, 108]. The use of this framework to con- 
struct. a rigorous validation procedure for simulation models is addressed in [31]. 
1 3 espi te t his a ■■ 4. i vi t«_y. t he lirn i tati o ns del a i U d in S u c : 1 i c m 12.4 i ndicate a m linber of 


open research issues that remain to be addressed. The primary issue concerns the 
construction of the phenomenological discrepancy term d(x iY qdi B ) so that it does 
not confound the physical model f(xi t q) when making extrapolatory predictions 
outside the calibration domain. As detailed in Section 12.4.2 and [43], this ne- 
cessitate* that stringent prior information be imposed on the admissible space of 
functions used to construct r?(£i ? For some applications, control theory may 

be used to construct d(ar t -) [202] and quantification of model discrepancies in this 
manner constitutes a current research direction. 


12.6 Exercises 

Exercise 12.1. Consider the steady state heat model (12.1) with the parameter 
set q ■ [3?, ft.] and the copper rod data compiled in Table 3.3. Using the thermal 
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conductivity value k = 4.01 for copper in the solution (3.22), repeat the anal- 
ysis of Example 12.6 i,o fit the combined physical plus quadratic discrepancy model 
to the data. Your augmented parameter set is q ail lf = |0. h, .do? h - #2 . You can 
use the MATLAB routines fminsearch , m or Isqnonl in, nn to implement the least 
squares fit. Do your residuals appear to be iid? 

Exercise l2 + 2 + Repeat Exercise 12,1 using the Gaussian process discrepancy rep- 
resentation (12.7) rat.her than the quadratic model (12.x r )). How do the physical 
parameters in the two cases compare? 

Exercise 12,3.. Consider the model 


Y* — f(xi , <j) 4- 

where /(ar^g} — T s (xi,q) is the solution to the steady state heat equation (12. 1) 
and is represented by the quadratic relation ( 12.5). You can use the thermal 

oonductivitv value k — 2.37 for aluminum. 

- cm-c 

Use the Oh AM algor it Inn discussed in Section E-0 to construct densities for the 
ai lgmei ited p av ai ] letei ■ set q a:// ,, |4>. h, 0 $ , li\ t So ] ■ W 1 lei i construct a ng ch arat :teri st.ic 

functions for noniuformative priors t you should enforce the positive or negative 
behavior of physical parameters. How do the means of your posterior distributions 
compare with the least squares estimates determined in Exercise 12.r. J 
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Surrogate Models 


Everything should be made as simple as possible , , but not simpler, Albert Einstein 

T1 lc equations of atmospheric physics (2,7), hydrology model (2.12), neu- 
tron transport equations (2.14)> and nuclear thermal-hydraulic equations (2.16) and 
(2.17) are complex PI) FIs that have numerous parameters and can require minutes 
to days to perform a single forward simulation. Hence the Markov chain techniques 
of Chapter 8 and uncertainty propagation techniques of Chapter 9, which can re- 
quire thousands to millions of realizations, will often be infeasible for these models. 
Furthermore, this will be the case for many complex models quantifying distributed, 
nonlinear, multiscale, multiphysics, or coupled biological phenomena. This moti- 
vates the development of surrogate models to facilitate optimization, uncertainty 
quantification, and control design. 

We consider two frameworks when developing surrogate models. To motivate 
the first, we consider the algebraic model 

= (13.1) 

will i i jbserva lions 

y = C T {q)6. (13. 2) 

A motivating application is provided in Example 3.G, whore a model of this form 
arises from the discretization of a differential equation quantifying neutron diffusion. 
There 6 = [^q, , , , . yp.y i \ T H where tfi rj ¥?(#*) approximates the dux at cc*, and the 
parameters are q = |D, A ai S i A,,-] , where D. A a ,S, and A ( j respectively denote the 
diffusion coefficient . a macroscopic absorption cross-section, a constant source, and 
the detector cross-section. The matrix and vectors $(</) and C (</) arc defined 
in (3.29) and (3.32). 

The observation or response can thus be expressed as 

:<; = /((/), (13,3) 

where the nonlinear function / is given by 

/(«) =C T (?M -I (g)i3((j). 


T71 
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For fine-scale discretisations or discretizations of 2-D or 3-D models, the numerical 
expense of evaluating / motivates the construction of inexpensive surrogates that 
retain the essential physics. 

Second, we consider the evolutionary PDE 


du 


= q) -I- F(q) , x € t € [U, oo), 

x € c® s t € [0, oo), 


flit 

£(«. </) - G(q) , 
it[0. x, q) = I(q) , 
discussed in Section 10,2.3. The corresponding weak formulation is 


(13.4) 


j -zrvdx I j N(u,q)S(v)dx = j F(q)vdx, 
J T? J'D JT> 


(13.5) 


which holds for all -t) E V, where V is an appropriate space of test functions. Here A 1 " 
and N are potentially nonlinear spatial differential operators. S is a linear operator 
that results from integration by parts, F is a source term, B and G are boundary 


operators, I specifies initial conditions, and P is a subset of jI 
response is taken to he observations of the state at (if x) so that 


w: 


or 


Tb 


y = u(t,x,q). (13.6) 

In general, one must solve (13-4) or (13-5) numerically, which can be extremely 
expensive for nonlinear operators with 2-D or 3-D spatial domains. For such appli- 
cations. surrogate models are required for design, uncertainty quantification, and 
control i i nj Cementation . 

For all mo del one first seekft to reduce the number of parameters or inputs to 
be estimated and propagated to only those that are identifiable or influential In the 
sense defined in Definitions 6.1 and 0.2. Non influential parameters are then fixed at 
nominal values for subsequent model calibration and uncertainty propagation. This 
is critical for two reasons: (i) only identifiable parameters and their uncertainties 
can be estimated using the frequentist model calibration techniques of Chapter 7 or 
Bayesian techniques of Chapter 8 with nonin formative priors, and (ii) the param- 
eter space must be reduced for models such as those arising in neutron transport 
or systems biology where p can be on the order of millions. Parameter selection 
techniques are detailed in Chapter fl. 

The objective of all surrogate models is to construct representations that quan- 
tify the primary features of the high-fidelity model while providing the computa- 
tional efficiency required for Bayesian model calibration, uncertainty quantification, 
design, and control Implementation, Surrogate models can be broadly categorized 
in three classes: regression or interpolation- based models, projection- based models, 
and hierarchical models. As detailed in Section 13.1, the first class of models is 
constructed by treating the original model as a black box from which samples are 
drawn to construct efficient input-output relations based on interpolation or regres- 
sion theory. These models are often termed data- fit models, response surface models, 
emulators , meta-models, or approximation models . and construction techniques in- 
clude stochastic collocation, polynomial approximations, radial basis functions, and 
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Gaussian process or kriging i ep r esent&t ions . It is important to note that these meth- 
ods are ye II- Till K 11 . i:im1 j ih-: w. vdjHi liv'dilal os 1 licir use for large scale applications 
and general purpose software packages. The second class of models, commonly 
termed reduced-order models, is constructed by projecting states and distributed 
parameters onto low-order subspaces in the manner detail xl in Section 13.2, We 
det ail eigenfunction or modal expansions, proper orthogonal decomposition (POD), 
and high -dimensional model represent alioii (IIDMR) methods. Hierarchical surro- 
gates are based on techniques - 1 u di tux coarser grids, relaxed tolerances, or simplified 
physical or biological assumptions. We refer the reader to [77, 87 j and the included 
references for details regarding this latter class of surrogate models. 

13.1 Regression or Interpolation-Based Models 

Certain principles underlying data.- fit models— also termed response surface models, 
emulators, met a- models, or approximation models — ate illustrated in Figure 13.1. 
The high-fidelity image is analogous to a highly resolved simulation code or fully 
characterized physical phenomenon from which samples are drawn to construct the 
miderreBolved images. Despite the limited number of sampled pixels, the partial 
images include euoi ■i’ll information so that the brain is able to construct a surrogate 
model and identify the image. This is due in part to its ability to enforce structure 
through symmetry and prior knowledge of faces, 

13.1.1 Algebraic Models 

We consider first the construction of an emulator for the algebraic model (13.3) 
where q £ V C M*\ In general, f(q) can be output from a high-fidelity simulation 
code that is computationally expensive or a physical process that can be measured 
for various values of q , Response data consists of M realizations or measurements 

Vm = f(q m ) , -rn = 1, . . « , AL (13.7) 

generated by M realizations of the parameter or input vector. We first note that the 
techniques used to sample q are critical for the accuracy of the emulator — e.g., M 
evaluations of the same parameter vector would yield a terrible emulator. Examples 
of appropriate sampling strategies include the Monte Carlo, Latin hypevoube, and 
sparse grid techniques detailed in Chapter II, Second, we note that for high-fidelity 



Figure KL 1 . I hgh - fidel t / y t may e a nd poo i iy tv, wait :ed t ipled i may r.s. 
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simulations, /(g) is treated as a black box, so sampling is iioiiintrusive in the sense 
that it does not require modification of existing software packages. This permits 
the direct use of legacy codes or executable files, which is a significant, advantage 
for many problems. 

The objective is to use the sampled data (^ m , y 7n ) to construct an emulator 
/(g) that approximates f(q) with sufficient accuracy while providing the efficiency 
to repeatedly predict responses for new values of q as required for optimization, un- 
certainty quantification, or control implementation. Whereas there exist a number 
of techniques for constructing emulators, we focus on quadratic response surface 
models, kriging or Gaussian process models f and radial basis formulations. Emu- 
lators based on stochastic polynomial expansions are discussed in Section 13.1.2 in 
the context of the evolutionary PDE (13.4). 

To motivate the statistical framework used to construct emulators, consider 
the response depicted in Figure 13.2(a). This represents high-fidelity simulations or 
experiments that have been resolved at a very fine scale along with sampled data 
The fine-scale resolution may be idealized and, for applications such as 
numerical or experimental resoh iti mi of 1 ui’buleni Hows, cannot typically be achieved 
for operations requiring numerous simulations. 

To accommodate unresolved fine-scale behavior, as well as experimental mea- 
surement errors, we consider the deterministic response to be a realization of a 
random process quantified by the statistical model 


Y m = fW n ) + -“m > m - A/s (13.8) 

where Y m are random variables with realizations y m . We denote the vector of 
observations by y, 5 = (j/i, — *y J y/] 7 '. Here £ = [ej, . . , , | t is a random vector 
;i : r-i I-. iiii 1 wil 1 : 1 1 n :-i d ■. l - i I line si'nl : 1 bc-hii vId:' cr 1 1 1 : ■ : i 1 1 r ■ - 1 1 - ■: il c-it- ■ i>;. \\ s' i-ss-.imc- 
that E m are iid and normally distributee l with mean 0 and true but unknown variance 
cjy so that Em ~ jV(U, ffj). We note that the use of the statistical model (13.8) does 
i: i : 1 1 f : I . 1 I'-.'l I lij'.l: I : : I ■ ■ I i 1 sin", ihn i-. *i nil r- ■= h ice i : ic :■ j] 1 1 brliii viur Ini', rather Thai 

it. provides a framework for constructing interpolation and regression models when 
fine-scale behavior is difficult or impossible to incorporate. 



Figure 13,2* (a) High-fidelity simulations or measurements , data, and emulator 
/(g) and (b) quadratic emulator. 
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Quadratic Response Surface Model 

The construction of a polynomial response surface model can be posed in 
the linear regression framework detailed in Chapter 7- To balance efficiency and 
accuracy, we illustrate the quadratic emulator 


P p p p 

]'(<}■. 0) = ffo + T A® + Y] AilJi + T XI PijQiQj’ (13.9) 


, : = I i= I. i—i j> si 


where q — |^j . . . . , r^,,| ■' is fixed and known and /Sqj^, % , and -'.Ci are unknown 
deterministic parameters for which we will construct an estimator. Since there 
are P coefficients , we need M > P samples from our high-fidelity 


simulation code or experimental measurements, 


r 

bhe 1 

linear i 

‘egression 

! 1 ieory ( 

if Chapter 7 yields the least 

squares estimate 
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Once j3 has been constructed, (13,9) can be used to predict responses for new values 
Of ff- 


Quadratic emulators are popular surrogates for large-scale optimization prob- 
lems since tho3 r provide analytic values for optima. However, they can have limited 
accuracy for high-fidelity models such as that illustrated in Figure 1.3.2(b), which 
motivates consideration of higher-order polynomial, Gaussian process, or radial ba- 
sis representations. 


Kriging Model 

Kriging estimation — which is also termed Gaussian process regression- 
originated in the geophysical sciences with the work of South African mining engi- 
neer Danie Krige. We again consider the statistical model (13,8} where the kriging 
emulator 

/(<?, 0 ) - g T {q)$ + (</) ( 13 . 11 ) 

is comprised of a deterministic trend function g 1 (q) J and a Gaussian process error 
model Z(q), as illustrated in Figure 13.3(a). In ordinary kriging, the trend function 
is assumed constant, so g 1 (q)j3 = 5q, whereas universal kriging assumes a poly- 
nomial representation g T (q)ft = ^ tLij .Afctffc U'i) with coefficients fi = $], fi^\ 

determined using least squares regression. To simplify the discussion. w r e consider 
ordinary kriging. where we construct an estimator do for 3\j , and refer readers 
to | '2 1 2 1 for details regarding universal kriging. 

The Gaussian process ensures that, in the absence of measurement errors, the 
emulator interpolates — and hence ha* zero uncertainty — at the sample points q m \ 
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Figure 13,3, (a) Interpolator y kriging emulator in the absence of measurement 
error and (1 ? ) uncertainty bounds. 

that is, !%) = l.hn.' In the absence of measurement noise, Z(-) is assumed 

to be a stationary random process, as defined in Definition 4.45, with zero mean, 
variance a 2 , and nonzero covariance 

CO v[Z(q%Z{qi)] = v 2 R{q\ q r ) + ^{q 1 - q 1 ), (13. 12) 


(13.13) 

is the correlation function. The hyperparameters Ok , 7 k, fc = 1, - . . can be tuned 
to achieve varying degrees of correlation, as illustrated in Figure 13.4, The ratio T) = 



Figure 13,4. Correlation function R given by (13,13) for q t M : (a) fixed 7 = 1,5 
and (h) fixed 0 = 2. 


where 


~ T r ) 


1 T q i - rfi ~ 0, 

G , else 


and 


q :l ) - exp ( - 0 f.\ql - ) , 0 < Jk < % °k > 0 , 

Jf-1 
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<7 2 /<7 q between the unadjusted variance and variance of the measurement noise 
is sometimes termed the nugget Alternative choices for the correlation function are 
provided in [212]. 

. * * 

As detailed in [212], the kriging prediction f(q. do) . for new values of q, is 


given by 


f(q. 3t>) = tfn + r 1 {q)R 1 [y* - /9r,l] , 


(13.14) 


when ■ 

3oi<K l) = [l T K-'l )- 1 1 r n~' y s (13.15) 

is the Ien.s1 squares estimate for 3o- Here I = [1, . . .. 1j” € IB. atn] TZ is an A7 x M 
correlation matrix defined componentwise by 7 Zjj — R(q\q-^). The M x 1 vector 
r(^) , with components r /?(</*, ^) n quantifies the Gaussian process correlations 
between values at the design sites and the new input q. We note that the second 
term in (13.14) employs the data to adjust the mean mud provide a best- linear 
unbiased estimate of the true or high-fidelity function . 

Additionally, the theory provides mean squared errors for predictions which 
can guide sample point refinement. In the absence of measurement noise £, the 
adjusted kriging emulator variance is 


w7(ij, i%y = »■ 


{r T ‘R~ ] I - I)' 
7?,— 1 1 


1 — r 7 'Hr + 


where the MLE of the unadjusted variance is 


I 


* = Jj[y* - 'K 1 [y . - 7)1]- 



(13.17) 


The use of the variance esi [mate to con&tmct confidence intervals for predictions is 
il lustratet 1 hi F igure 13.3(b). 

To construct s 2 and hence var[/(y, d'o)]^ one must first estimate the hyper pa- 
rameters 0 [Pi, . . . 1 ftp] and 7 = [71, . . . , 7^,]. One approach is to estimate them 

using the MCMC methods detailed in Chapter 8, A more common approach for 
kriging models is to estimate them by maximizing an appropriate likelihood func- 
tion. Based on the assumption that, the sampled data are drawn from a Gaussian 
pn 1 cess, the likeliht >od function is 


L(3 0 . s 2 |tf, 7 ) - 


1 


y^a 2 )^ K 
so that the log- likelihood is 

.2 


exp — 


y* - J R [y fl - k 8o 1 

2a 2 


f(A, 7 ) = " “ 1h(2jt) - y ln(s a ) - I In \li\ 
- [y* - A) 1 1 ’ 1 k 1 [y, s - A)i]- 


(13.18) 


The necessary conditions - - 0 and = 0 respectively yield (13.17) and (13.15). 
We ihen substitute (13.15) into (13.18) to obtain 


mi) = -ypn(2f) + 11 - y ln^ l7 ) - lln|-B(fl, 7 )|. 


(1319) 
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Whereas we cannot maximize (13.19) analytically, one can readily do so using var- 
ious optimisation routines. 

Issues associated with Gaussian process or kriging representations include 
ill- conditioning of K, exponential growth in the number of hyper parameters for 
largo parameter dimensions p, multiple local maxima, and ridges near maximum 
values. The final two issues motivate the use of global optimization routines. Ill- 
conditioning of H is addressed through attention to sampling algorithms and adding 
nuggets rj to the diagonal of 7Z until 1Z 4 - -tjf is no longer ill-conditioned. However, this 
latter strategy smooths the kriging model so that it approximates rather than inter- 
polates the sampled data. The curse of dimensionality for large p can dictate that 
other techniques be employed to construct surrogate models for high-dimensional 
problems. 

Details regarding tuning strategies for the hyperparameters can be found 
in [247] j whereas extensions of the kriging formulation that include gradient in- 
formation are detailed in [85]. 


Radial Basis Functions 


As an alternative to the polynomial basis expansion (13.9), one can employ 
the representation 

M 

/(<i) = Y + TO. (13.20) 

tu=! 


where (//] — ^{||g m — <j||) are radial basis functions defined in terms of the 
Euclidean distance between and </, arid P{q) is a global trend function. For this 
discussion, we take P(q) as we. did for the kriging model. For r m \ q Tfl — ^r|| , 
specific choices for include 


( e r, r j2c r __ Gaussian, 

— < -jf n , n = 1,2,3 , Power law, 

( r~ n hi r m , Thin plate spline. 


(13.21) 


To specify the coefficients f m , we enforce the interpolation condition 


f(q m , Aj) = Um r m = 1, 


M, 


along with the constraint 


M 

^ ^ fm — d. 

m — 1 


which results from the inclusion of the trend function. This yields the radial basis 
function predict ion 


/(* A) = A) + * T 



(13.22) 


where = 1 r ‘l’ ’ lj 1 1 ' •!> 'y, and 

A comparison of (13.22) and ( 13 . 14 ) illustrates that the radial basis function 
expansion and kriging model have essentially the same form. The difference lies in 
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the fact that the radial basis function model is formulated in terms of the gram 
matrix and vector, constructed in terms of the basis functions, whereas the kriging 
model employs the correlation matrix and vector. The kriging framework also has 
the advantage that it provides variance estimates for the prediction. 


13.1.2 Evolutionary PDEs 


Ilcrc we consider the evolutionary PDE (13,4) or (13.5) with slate observations 
(13.6). Hue r.o the computat ional expense of solving PDEs with potentially nonlin- 
ear operators on 2-D or 3-D spatial domains, emulators are required for Bayesian 
model calibration and uncertainty propagation. In fact, we have already constructed 


such emulators in Section 10,2.3! and we simply highlight, here the manner in which 

those constructs achieve the objectives of surrogate models. 

Here the surrogate model is taken to be 


K J 

u{t,x,q) - Y, V u jt: (1)6 } (x)ty k (q), 
k = U j=l 


(13.23) 


where &j(x) are finite clement or spectral basis functions and the input basis func- 
tions "k 1 ( ff) - • Lfciq) are Lagrange polynomials that collocate at. the M -- R quadra- 
ture points q m — r/ 1 "; that is, T ) — Atm in the manner shown in Figure 13.5. 
We note that the input basis representation (13.23) is analogous to the algebraic 
expansions (13,9) and (13,20) with Lagrange polynomials used instead of quadratic 
polynomials or radial basis functions. The coefficients are deter- 

mined by solving the MJ evolution equations 



j 


4- j N ( Y. (~ K )i <f | $ {<?f. (a-) )dx 
j= l 


u 



( 13 . 24 ) 


for £ = i ..... J. The points q :u = q ! are specified using the Monte Carlo or sparse 
grid techniques detailed in Chapter 11, 



Figure 13,5- Collocation of the surrogate model (13.23) through data generated 
using a high-fUleUt.g model. 
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The surrogate model (13.23) is thus constructed using solutions of the high- 
fidelity simulation code with the number of required solutions minimised through 
the use of sparse grid techniques for moderate parameter dimensions. The use of 
this emulator for uncertainty propagation is detailed in Chapters 3 and 10. 

Remark This is a surrogate in the parameter space but not in the spatial do- 

main. For 2-D or 3-D spatial domains P, the spatial discretisation level J can easily 
be on the order of millions or higher t which can render the solution of (13.24) pro 
hibitively expensive, especially for nonlinear operators ftf and N. This is addressed 
for certain problems by the projection-based reduced-order methods. 


13.2 Projection- Based Models 

Reduced-order models are constructed by projecting high- dimensional states u and 
parameters q onto low-dimensional subspaces in the manner depicted in Figure 13,6. 
This is in contrast to interpolation or regression-based data-driven models. 

To motivate, we again consider the evolutionary PDE 


( 13 . 25 ) 


which has the deterministic weak formulation 

/ -vdx-\- I N(u, q)S(v)dx m j F(q)vdx , bgK (13.26) 

Ju Jt? J j? 

Here q G F C R p and V is an appropriate space of test functions . To approximate 
the state t'{t 7 x^q). we project the problem onto the finite-dimensional subspace 
V J — span{^j} C V t where e>,- (j: ) are finite elements or spectral basis functions. 


— = + F(q) . x G T>. t G [0, w), 

B{u,q) = G(q) , x G OV, t € [ 0 ,w), 

v{[ Kx 7 q) = mix, q) . X G V, 



F i gure 1 3 , ft r Projec lion of ih e i nfim te- dimensional pro blem (13.26) onto the fi ni te- 
(prrieusional subs pare T"' r and reduced - order suhspar.e V'*" . 
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and consider the approximate solution 

,/ 

u J (t X, q) = V (13-27) 

j=i 


which zTLiisi satisfy 


r du 


j 


<p£ dx + / N (u'\ q)S(4>t )dx = ( F(q) Ofdx 


J’D 


Ot 


(13. 23) 


JD 


J D 


for £ l.... r /. The difficulty is that for D c R 2 or lH'* , ,/ can easily he on the order 

of 10*- I0 8 f which, when combined wi ill tl iq potentially nonlinear spatial differential 
operator N\ can yield simulations that take hours to flays to complete. 

The goal with reduced- order methods is to project (13.20} onto a significantly 
lower- dimensional subspace V ''' = span{^} C V t J T J , so that the solution of 

/ ■ ■fficfcr I f N(u Jr 7 q)S(<pl)dx = f F(q)4>}dx , f= (13.29) 

J'D J'D J'D 


with reduced-order solutions 


J r 

u{t, X, q) = a:, £f) = F Uj {t)<j> T j (x) (13.30) 

i= i 

is sufficiently efficient to permit optimization, uncertainty quantification, and model- 
basi'd control implementation- 

Initial Conditions 

To construct initial conditions for the reduced- order model t it is necessary to 
project u(0, x, q) — q) onto V Jr , To this end, we assume that for each t, u(t , x) 
is an element in the Hilbert space X having an inner product (-, ■} and norm || - 1|. as 
detailed in Definition A. 3. The semidiscretization of (13.23) can then be expressed 
as 

fl St “^ T " 

_ — = p J *mp J - u Jt ) + p J -F, 
dt V ^ . (13.31) 

w ,/r (0 s x) = / 5,r w u {x), 

where P J ' is a projection operator from X onto V J * and vve suppress parameter 
dependence. 

For Galerkin approximation, the projection operator P Jr : X ^ V Jr ' is given 
by 

Jr 

pJr f = Y, Vj<P T r ( 13 - 32 ) 

j=l 

where / is an arbitrary element in A". The vector rj s [?j ln i fi given by 

ji = M _l JF 
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and 


- {/.**) 


for i, j = 1 , J r . Wc note that M is typically termed the mass matrix. This 

representation for results from the orthogonality property 

(P Jr f -f,g) -0, 

which holds for all </ E V Jv , Details regarding the matrix representation L Jt for 
projected linear operators L can be found in Section 11.2 of 25]. 


Reduced- Order States and Snapshot Sets 

We focus on the construction of reduced-order states when the number of 
states is significantly larger than the number of parameters, J p- For models 
with large parameter dimensionality , one would first apply the parameter selection 
techniques of Chapter 6 to reduce the dimension of the parameter space. 

The crux of reduced-order methods is t he construction of a reduced-order basis 
{<s£} and projection of the problem onto the space V Jt — span {do} C V ,f . We focus 
on three techniques to construct reduced-order basis functions. 


* Eigenfunctions or modes have long been employed as reduced- order basis func- 
tions for engineering and science applications, including thermal, structural, 
acoustic, and fluids problems. We provide an example illustrating this ap- 
proach in Section 13.3. 

* Snapshot-based methods arc discussed in Section 13.4. In this approach, high- 
fidelity numerical codes or experiments are evaluated at several independent 
variable or parameter values to construct a snapshot set from which a coherent 
sot of reduced-order basis functions is constructed. Wc focus on proper orthog- 
onal decomposition (POD) methods, which are closely related to Karhunen- 
Loeve expansions, principal component analysis (PX’ A), and singular value 
decompositions (SVDk). As detailed in [48], centroidal Voronoi tcsselations 
(CVTs) constitute an alternative snapshot-bas* ;d method. 

* In the high-dimensional model representation (RDMR) techniques discussed 
in Section 13.5. reduced- order basis functions are constructed by quantifying 
(ir&t- and second-order interactions between input parameters. In ANQVA- 
H [)\ [R, this is accomplished by integrating over the parameter space, whereas 
cut-HDMR representations are constructed by evaluat ing responses or states 
at references points The relation between HD MR techniques and variance- 
based global sensitivity analysis is detailed in Chapter 15. 

All three techniques produce global basis functions which in turn yield dense 
systems. This is in contrast to the original systems, which are generally sparse, and 
necessitates that J r be truly small so that solution of the dense system is sufficiently 
efficient for optimization, uncertainty quantification, or control implementation. 
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13.3 Eigenfunction or Modal Expansions 

Eigenfunction or modal expansions constitute a common choice of reduced-order 
model that, has long been employed for simulations and control design in engineer- 
ing and soil litific applications. The 3 r are typically constructed from solution or 
approximate soli it ion of the original problem using separation of variables. 


Example 13*2* To illustrate, consider the heat equation 

0 < x < f, . t > 0. 


&T d 2 T 

= O'- 


at dx 2 ’ 

T(f, 0) = T(t,L) = 0 , t > 0, 

T(Cg x) = To(^) , U < x < L, 
which has the solution 


CO 


T(Lz) = Y,y e ’ t ™ l(X 3^ 

j=l 

with eigenvalues Xj = - ^ eigenfunctions X,(a. r ) — sm(Aj#) f and coefficients 


2 


rL 


Jj = y j T[}(x) siu(A jx)dx. 

L J o 


The choice of eigenfunctions as reduced basis functions, ^y(sr) sin(^p-), thus 
yields the reduced-oeder model 


u 


J7YX 


Tlt.x) = 23 '■ /1 ;( i ) sin \ L r- 


where ti ,{t) are determined by solving 

J of . r L dfdtf 


$[dx + a f 
Ja 


Ot Jq Ox dx 

for i 1 /, . r I’h is yields the vector system 


dx = n 


MT 4 - K T - 0 . T( 0 } - T 0 . 


(13.33) 


where T(t) ■ ■ (t) , 7'^ (t) ' . M — ~L. and DC is a J r x J r diagonal matrix 

with diagonal elements 4 { ( y 1 ) “ , { y 1 ) 2 . The initial condition vector is 

To = [71 1 . - - , ■ We note that (13.33) reduces to solving the J r independent 

eouations 


St — 0,/D’Jl. 

Tj(0) — 7 j 


(13.34) 


for j — 1, * * . , ,/ v .. As detailed in |9S], the error 'fit, a;) — T(t,x)^ for this example, 
converges more rapidly than e~ J ^ 1 as J T > 00 for any i. Hence very few reduced- 
order basis funcUoiis are required to achieve high accuracy, 
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For PDFs arising in applications such as those detailed in Chapters 1 and 
3, one typically cannot const met analytic eigenfunction relations l>nt instead must 
seek numerical approximations. For many structural and fluid applications, the 
use of analytic or numerical eigenfunction expansions is well established for sys- 
tem representation, optimization, and control design, and their use for uncertainty 
quantification will certainly grow as the field matures. 


13.4 Snapshot-Based Methods including POD 

he ■( ll t: : 1 l-' Mv li] : 1 1: ■: :i- ■ [-. I lilM'i, l -Ul LL l.-li: l- ill V Himuhit i: >t)K i Jl OX] uM 1 L J l> 1.' Mil Up -.-A A ■ - 

ments are often categorized as Lagrange, Hermite* or Taylor basis methods. 

* Lagrange basis methods employ numerical or experimental state solutions 
evaluated at various parameter values q ir or independent parameter values 
such as times £ m . 

* In Herniite methods, one employs both state solutions and their derivatives 
with respect to parameters their sensitivities to construct reduced basis 
functions. As with Lagrange methods, the state solutions and sensitivities are 
evaluated at. a variety of values for parameters or independent variables. 

* Taylor bases arc constructed by evaluating the state, sensitivities, and higher 
derivatives with respect to parameters at a fixes! set of parameters and in- 
dependent variables. This approach is complicated by the complexity of 
constructing higher-order sensitivities and the fact that their number grows 
quickly as the order increases. 

We focus primarily on Lagrange basis methods. 


Definition 13.3 (Snapshot Set), The sot of numerical or experimental solutions 
generated at several independent variable values or using various parameter values 
is termed a snapshot set; see Figure 13.7. For example, u^{t rn , x<, <j) and #, q m ) 
are temporal and parameter snapshot sets for the evolution model (13.2(>). 




Figure 13,7, (a) Numr.i'ical or Experimental shapshots and (b) &TUip#hots u Tn (rr) 
und u m £ & r ‘ at. tim&8 t m .. 
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II fa clear that the techniques used to sample parameters or determine evalu- 
ation values for independent parameters are critical for constructing a basis that is 
sufficiently rick in the sense that it incorporates all expected system dynamics. This 
is notably critical for two important uses of models: prediction based on extrapola- 
tion and control design- The latter fa especially challenging since the application of 
feedback can introduce dynamics not present in open loop responses. In both cases, 
one seeks inputs — e.g. 7 impulsive forcing for time-dependent problems — that excite 
the widest passible range of dynamics and sampling strategies that adequately incor- 
porate this information. Reasonable parameter sampling strategies include greedy 
sampling [154] as well as the sparse grid and Monte Carlo techniques detailed in 
Chapter IP The evaluation at randomly chosen q fyr is also related to the nonlinear 
parameter selection techniques discussed in Section 6.2. 

Whereas there is sonic theory to guide the choice of snapshot locations [141], 
the choice of independent variable values at which to generate snapshots often relies 
on expert knowledge of i J lo application and can be more of an art than an exact sci- 
ence. Statistical theory regarding the design of experiments has been used to guide 
samples for certain applications, but comprehensive algorithms to guide snapshot 
construction are still lacking, and this constitutes an active area of research. 

Both numerical and experimental snapshot sets can contain significant amounts 
of redundant, information from which the basic structure must he extracted when 
constructing reduced basis functions. This can be accomplished using the POD 
techniques discussed here or C VT$ |4&|. 

POI) with Distributed Observations 

We consider first M distributed observations [ u m (;r)} j’l,' , numerically or ex- 
perimentally determined for all x in a domain V. For example, u m {x) could rep- 
resent solutions of the evolutionary P1)B f 1 3.4) at times t m , as illustrated in Fig- 
ure 13.7(b). In general 7 j: can be a generic independent variable. 

Since one is typically interested in deviations about a mean value, the first 
step is to construct a modified snapshot set 

VfYi — ti m — U . m — 1 M , ( 13 . 35 ) 

where 

1 M 

n = M = ]jj 2- “«(*) 

m = 1 


is the average of the ensemble set. The POD technique provides an algorithm for 
extracting a compressed description of the behavior encapsulated in the redundant 
set (13.35). The procedure is closely related to the Karhunen Loeve expansion 
detailed in Section 10.2d, PC A, and the SVD discussed in Section 6. L 

To quantify the behavior encapsulated in the modified snapshot set {u>n}£f = i , 
the POD technique extracts from this set a coherent structure which has the largest 
moan square projection onto the set of observations. This fa achieved by seeking 
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basis functions of the form 


M 

r Hv) = yj cwu m (z), (13.36) 

m-1 

where the coefficients a m are chosen to maximize 


1 M 

— Y 

M *-*. 


(i'm f <$) 2 subject to (<?,0) = 


■ = I 


(13.37) 


■m - I 


Here (■, ■; and | ■ || denote the usual L~ inner pro duel and norm oil 'D, 
We follow the procedure in [161] and define 


i X 

C{x t y) -- ■■ — 2 _, v m (x)v m (y), 


m = l 


R&- j C(x,y)4>(y}dy 
J TJ 


(13.38) 


for ip e L 2 CD). As developed in Exercise 13.1, it follows that 


and 


1 M 

{Rep. <p) = — I 4>) I" 


1 


(13.39) 


iRcp, i 7 } - {<P, 

for all d>,p G L~{V). Because J? is a symmetric, nonnegative operator on L~( V), 
the problem of maximizing (13.157) is equivalent to finding the largest eigenvalue A 
of 

Rrp = Xtp subject to ||(p|| = 1 (13.40) 


or 


C (x ? y j 4>(y) dy = A wi tl i 


= 1, 


(13.41) 


T> 


The substitution of (13.36) for o into (13.41), with C given by (13.38), yields 


M 

y. 


TTl—1 I ft=l 


iW 


y{ — 


Vm(y)vk{y)dy a k . 


D 


M 


V m (x) - >2 An ra t? m (z) ( 


m—1 


which can be expressed as the eigenvalue problem 


KV - AK 


(13.42) 


where 


K 


fit ): 


T 7 / v m (z)v k (x)dx 

M j-n 


(13.43) 
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and V = [«i , fta, . „ . , «m ] 7 - Because K is a iiomiegat ive self-adjoint matrix, it lias 
a complete set of orthogonal eigenvectors 



w l ■ 


f A 


' < ' 

V 1 - 

a ii . 

. V 2 - 

_ a li . 

II 

F- 

. < . 


with correspondin g eigenvalues A} > Aj > - - - > > Q + 

The relation (13,37) is maximized by 


M 


<M*) - $3 w tv.n{a;), 


ni-l 


where a* t are the elements of the eigenvector Vj corresponding to the largest eigen- 
value Aj. The remaining functions 


M 

<h( x ) = 53 a L v m( x ) 

m — 1 

are constructed using elements of subsequent eigenvectors Vj corresponding to the 
j r " ordered, decreasing eigenvalue. These functions are orthogonal but not or- 
thonormal. It is established in Exercise 13.2 that the fund-ions 

a-; 

<^(s) = 53 7ff\ vLvmfa) J = 1 M, (13.44) 

m=l V MA i 

form an oithonormal basis set. To construct a reduced- order basis set, one employs 
the first J 1 ' basis functions where J y -C M. 

Remark 13.4. The relation (13,41) is precisely the integral equation (5.6) that 
is solved to construct the Karhimen Loeve representation (5.5) for a correlated, 
second-order random field a(x, w), This illustrates one of the ties between the POD 
and Karhunem- Loeve techniques. 


POD with Discrete Observations 

The assumption of distributed measurements u m (ar) , x € P, facilitates discus- 
sion in terms of the familiar L " inner product but is rarely achievable in practice. 
Instead, one typically measures each snapshot at n points. For example, this would 
bo the case if one has n spatial sensors to measure flow for A/ input parameter 
values or at M different times, as illustrated in Figure 13.7(b). 

We consider M snapshots u m £ W 1 along with the modified snapshot set 
v m = u m — u, where u again denotes the average of the ensemble set. We summarize 
the construction of the POD basis functions for the case M < n and note that the 
theory parallels that for distributed observations with the Euclidean dot product 
and norm used instead of the L 2 inner product and norm. 
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We first construct the n x M snapshot matrix 

A — [ui, . . . (.13.45) 

having rank r < uiin{n., A/}, and M x M matrix 

A- - ±A?A. 

Since = 3 this ^ the discrete version of the matrix defined in (13. 43)* 


The POD basis J r £ {1, . . . , r} n is computed by solving the M x M eigen- 

value problem 

(13,46) 


A Vj — A j Vj 


and taking 




AV} 


(13.47) 


yj Af Xj 

for j = = 1, . . . j J r . The eigenvalues are taken in decreasing order. We note that the 
eigenvalue problem (13,-16) corresponds to (13.42). whereas the POD basis function 
d r f of [13.17} is the discrete version of dC(a?) defined in (13-44). 

To relate the POD basis to the SYJ)> we consider the factorization 

.4 = u>.:v T , 

where U = U\ * U n ] € K 1lxri and V 6 M jUx;U are orthogonal matrices and 

S E R nx;,jl lias Che form 

' D 0 
0 0 


>: = 


where D = diagfci „ . . . . a T ) € IP’**’. The singular values axe related to the 
eigenvalues of K by the relation crj = MXj. It is established in Definition 1.2 
and Theorem 1.1 of |267] that, the POD basis functions are the first J r left singular 
values U: of A; that is, &j = Uj . 

One then employs the reduced fepresental ion 

J, 

5 = E u ^p 

j=i 

where J r M . FVorn SVD theory, it follows that the error incurred by using the 
reduced basis set is 

M M 

e- E *i- M E v 

j=J r +i j=J r + l 

To achieve a relative error less than S . one chooses the smallest J 7 so that 


J T . M J r . M 

Zd/Ed-EVE^* 1 -* 

j= i 3 = 1 3=i 3=1 
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For historical reasons, this technique is often referred to as the method of 
snapshots when M < n. If A/ > n. the POD basis is directly computed by solving 
the eigenvalue ■ problem 

A A. * 0 j ■ = X j Uj , j = 1, . . „ , J T , 

and taking ^ = Uj . 

The reader is referred to [257] for theory, error analysis, and example* lllus- 
1 1 atm g the re] ati on between POD. S V D , an d bal an eed t n meat] or i . 

13.5 High-Dimensional Model Representation 
(HDMR) Techniques 

Consider the nonlinear algebraic model 

y = /(<?) (13.48) 

of (13.3). where q £ I" C and / denotes the output of a high-fidelity simulation 
code. For this discussion, we assume that the support of each parameter has 
been mapped to the interval [0, 1] — e.g,, using the bijective mapping (11.2) — so that 
F ■ [0, l] p is a p-dimensional hypercube- IIDMR techniques for general domain* 
and densities are detailed in Section 15.1.2. In general , /(t.x. q) can also depend on 
independent variable* such a* space and time, but we suppress these dependencies to 
simplify notation. We note that (13.48) can result, from the algebraic representation 
of a discrete application or the finite element, finite difference , or finite volume 
approximation of an ODE or PDF. 

To construct an IIDMR or Sobol representation, for the response /(</). one 
employs the finite, hierarchical expansion 

p 

/(?) = fo I ^3 /*(&) + ^ + ■ ■ ■ 

i=i l <i<j<p (13,49) 

I ■ ■ ■ j?iJ I I <7p). 

!<.'] < - <i, <r 

We establish in Remark 15.3 that the constant function j\-, is the mean response of/, 
whereas the first- order univariate functions f;(q ,) represent independent contribu- 
tions due to the individual parameters. The bivariate function* /tj (<ji , </j ) quantify 
the interactions of ^ and qj on the response y with similar interpretations for 
higher-order interaction terms, The final term , ?p) quantifies un- 

incorporated high-order residual effects, thus ensuring that the expansion (13-49) 
provides an exat:t representation for f(q j. This is in contrast to the polynomial 
expansion (10.1) or (10.2), which requires an infinite number of terms to ensure 
that it is exact. 

In practice,, one typically employs the approximate expansion 

p 

!'{<}) /:> \ X) Ji(fc) + H filial *ii) 

i — l I <T<V <jCJ 


(13.50) 
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based on the assumption that higher- order interaction terms have a negligible effect 
on the response. Whereas this assumption is reasonable for a number of applica- 
tions, its feasibility for specific problems must be ascertained using further physical 
or numerical analysis. For example, it will not be valid if y is discontinuous with 
respect to q, We note that this decomposition can be interpreted as iteratively fit- 
ting along coordinate- aligned subspaces starting with the 0-D subspace associated 
with /q. 

Remark 13*5, In the context of the projection-based framework discussed in Sec- 
tion 13,2> the terms /o> and fij (</■,: n qj) constitute the reduced-order basis 

functions ipj and the generalized Fourier coefficients are unity. We detail the nature 
of the projections in the context of specific HDMR representations. 

Remark 13.6. The representation (13.19) is analogous to the many- body expan- 
sions employed in molecular physics to quantify the energy due to atoms in a 
molecule. The truncated expansion (13.50) represents the case when higher-order 
interactions have a negligible effect on the energy. 

Re i i mr k 13.7. The mot i vation for \ mneati ng E 1 D Iv ] li exp ai isions i n high di mei i- 
sions is related to the concentration of measure phenomenon* which, in its simplest 
form, states that every Lipschitz function is accurately approximated by a constant 
function if the dimension p is sufficiently large; e.g., see page 7 of [47] . 

The representations (13.49) or (13.50) arc not unique, and hence additional 
structure must be imposed to construct, the components /*(#), Qj) and higher- 

order terms. As proven in [200, 227], each term /* ( co ■ ■ . ■ , c/j, ). & = 0 . ... , p, 

where fa corresponds to $ = 0, is uniquely specified by minimizing the functional 


r 


-i 2 


/(<?)- [ /o + ^ Afe) i h V' / e i , . . . . I - H (% , ■ ■ ■ > ) 

1 


i = 1 


dp(q) (13,51) 


subject to 


|0,1| 


/ ie-i C^/lf i : - - - A/ij ' 9 


(13.52) 


for k 1 , , 3. The measure dp(q) defines a projection operator from E onto 
subspaces defined by the individual components of the expansion, and hence it 
defines the particular form of the expansion — readers unfamiliar with measure the- 
ory can interpret dp(q) as wdq, where w — [wf, , is a weight and dq 

[dgi, .... The constraint (13.52) ensures that functions . ^ > . . . , q^) 

and A: , > - ■ ■ i ) are orthogonal, that is, 



* Q±r )fii 


iAVin- ‘ = 0 . 


when at least one index differs in the sets {ii 7 . . , , *- T } and {i \ , . . , 


( 13 . 53 ) 
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13-5-1 ANOVA-HDMR 

We first consider the case when tfp,(<}) is the Lebesgue measure on T so that dfi{q) - 
j [ f . f . dq.i . As detailed in [162, 20 0. 227 the zeroth-. first-, and second-order terms 
in this case are 


h = J f(q)dqi 

fi(Qi) ~ [ f (y 1 ) — fo , (13-54) 

Jr v- > 

fijiii, <h) - [ f(q)dq~itf} ~ Mu < ) - Mqj) - /o, 

Jr?- 2 

where T? " 1 = 0, 1 1 and IT 2 = 0,1 ;| 2 . Tlie notation denotes the 

vector having the components of q except those in the set {ij}; hence dq^i 
dq\ - dqi-xdq w ■ ■ ■ dq p . 

This is the representation employed in analysis of variance (ANOVA) statisti- 
cal techniques to determine the variance components of the response, This is central 
to the global sensitivity analysis detailed in Chapter 15. 

As illustrated in the next example, one must include the density p Q (q) in the 
representations (13,54) If it is not uniform on [0, l] p , HDMR expansions for general 
del i si ties are t let ai 1 eel 3 n IS e* : t i on 15.1.2. 

Example 13.8. To Illustrate the analytic construction of /q, and fij (<fr n ) , 
consider the Ishigami function 


f{q i , '/2, *1 1) - sinf(i + «■ win' <h + b<I?, win 'ji 


with the density 


PqMi) 


— . — ?r < r/, < t. 

2?r “ 

[) , else. 


as proposed in [122] and detailed in [215]. As established in Exercise 13,3, 



/l(?U = ( l + sil l<Pl ' = asin2 <J2 - 

/i^(9i,9a) = 0 , gi3(<?i,$i) = (b<h ~ h'Cj sin?! 


■ Mos) = o, 

f-zsiq-i* <1\$) = T 


(13.55) 


and the residual term is /i 23 (^ 1 , £ 2 , (fa) = 0. Hence the representation (13.50) is 
exact. The orthogonality of the terms is established in Exorcise 13,4. 

Whereas the representation (13.54) facilitates sensitivity analysis and uncer- 
tainty quantification „ the construction of fa generally requires numerical integration 
over a 11 of T , while one must integrate over all variables except </* when constructing 
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For moderate parameter dimensions, this can be achieved using the sparse 
grid or Monte Carlo techniques of Chapter 11, The latter yields random sampling 
(RS)-HDMR techniques. Alternatively, one can circumvent the difficulties associ- 
ated with high- dimensional quadrature by employing cut- II OMR representations 
based on values of / evaluated at nominal points q € F, 


13.5.2 RS-HDMR 

R a i idoi n sampling ( R S ) - H DM R techniques employ the Monte Carlo techniques of 
Section ILIA to approximate the high- dimensional integrals in the expressions 
(13.54), This yields the approximate relation 


1 A 

a = -d 52 /<9 r ) 

r= 1 


(13.56) 


for the mean with analogous relations for f) (qi) and , (^-. q.j). As detailed in [152 . 
however, the required number of random samples increases exponentially as the 
order of component functions grows, thus prohibiting this direct approach for many 
problems. 

To reduce the sampling effort, one can represent the first- and second-order 
interaction terms as 


/i 


fii'li) = 


k = I 


K L 


( 13 . 57 ) 


vj) 


k-i e-i 


where are Legendre polynomials scaled to the interval [0,1], It is illustrated 

in 1 152 1 that substitution of the expression 


p k 


K L 


(13.58) 


f(q) ® fa i EEd^te) + E 

t=l k—i I <i<j<p k — l i-j 

into the minimization problem (13.51) yields the equivalent minimization problems 

fir;,, 


f [ 


J1 1 

min / 

f-Mi) ~ > 

Jq 

- 

“ J 

fi 

yl 

K L 

min / 


fiji&i qj) ~EE 

^ A J 

0 

i—3 

II 

r — 

1 

-*• 


i 2 


dqidqj 


for the coefficients. 

The coefficients a = [o;j, . . . , cdj 1 are specified by the solution of the matrix 
system Aa = b , where 


^kf. 


}dtj t - jk : f - 1 — Ji\ 
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with 7^ = 2^“j" r W- The components of h are given by 


h = f fi{Qi)'Vh{m)dqi 



TP- 1 


/(?) n dq i - h 


^kUli)dq t 



where the second and third equalities respectively follow from (13-54) and the fact 
that Jq 1 = - 0 for Legendre polynomials with k > 1. The components of b 

can then be approximated by 


T — 1 


which employs the same samples /(r/ 1 ) used in (13.56) to approximate f\}. Tlic 

coefficients o\ are thus approximated by 



1 1 
~Ti 


R 

r— 1 


If oiio employs the t ensor product basis functions 

the coefficients are specified in a similar manner by 



1 1 
Ike R 


R 

fr=] 


(13,59) 


(13.50) 


where 7 ^ — 7k 7£. It is established in Exercise 13.5 that (13.59) and (13.69) are 
precisely the relations yielded by the discrete projection methods discussed in Chap- 
ter 10. The only difference is the ordering of basis funct ions used to construct the 
multinomial basis functions. 

The connection between RS-HDMR. formulated using orthogonal polynomials, 
and discrete projection indicates the manner in which this technique can be used 
to construct projection-based reduced-order models- The use of HDMR expansions 
for global sensitivity analysis is detailed in Chapter 15. 

13.5,3 Cut-HDMR 

Cut- HD MR avoids the issues associated with high- dimensional quadrature by em- 
ploying a Dirac measure r/ju(q) - flf^i — q\)dqi when minimi zing the functional 
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; , JKJ 



fa = /(?). 


Mot) = /(<?) 



(13.61) 


= /('?)! 




where the notation q = q\qi indicates that the components of q oilier than qt are 

set equal to those of the reference point; that is, 



The first-- order projection in this case is defined by 



and higher- order projections are defined in ail analogous manner. The name results 
from the properly that component functions are defined along cut lines, planes, 
and hyper planes through the reference point q 1 as illustrated in Figure 13.8. This 
approach is also termed anchored HDMR, 

The choice of reference or anchor points q is critical for cut- HDMR, espe- 
cially when only first- and second-order terms are employed. As detailed in [90| t 
choices include the mean value so that f(q) ~ /, the centroid in uniform parameter 
spaces, or the centroid of sparse quadrature grids. Alternatively, one can employ 
multiple anchor points [152] or anchor points specified through the Morris screening 
techniques detailed in Chapter 15, 

To employ the cut.- HD MR representation as a surrogate model, one evaluates 
the component, functions fi ( - n } and /y (gj* 1 * qj l ) at s discrete values { ^}m=i along 
cut lines and planes through g, as illustrated in Figure 13.8. The cost to construct 
the zeroth-, first- > and second- order terms is 


1 T p(s - 1) 


pip ~ l)(s ~ 1 ) 


9 


x X JT * x V >T 


(a) 



Figure 13.8, (a) Reference point (Vp , i/o j and cut points q{ { \ and (b) interpolating 
function /i ( g i ) . 
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which exhibits polynomial growth in p and # as compared with the exponential 
growth rate s p associated with tensor product sampling. 

The first.- and second-order interaction terms for arbitrary points 
can then be represented as 


M'irn = £ - /*> 


m — 1 


.s * 


■m= l »=i 

- m ™) - im ™) - f 0 , 


where 


L <rM<) If 


if- 

Vi ~ <tt 


m -<Ii 


k=l 
i.- /"in 

is the 1-D Lagrange interpolating polynomial that satisfies 


(13.S2) 


( l 7-f' ) — $?nk i 1 S m., 5 r ^- 

Similarly, L, nn (qi, qj) = L m (qi) ® L n [q.j) is the 2-D tensor product interpolating 
polynomial. A 1-D interpolating function constructed in this manner is illustrated 
in Figure 13.8(h). 


Example 13*9, We aga i n Ct > i isi d or 1 lie: loud ion 

■rj j 

f{qi , £2 > q-.i ) = sin^i H a sin" q 2 I bq$ sin q \ 

of Example 13.8 where q £ [— u\ ttJ 3 , For the centroid reference point ;] — [0,0.0), 
the cut- HDMR components are 


U = 0, 


~ #inqi . /affla) = asm 2 qz - /s($s) - 0 T (13. S3) 

/i s(gi, qi) = 0 , qiafa], <?s) = si - Lf |] > /safe, tfs) = 0, 

which differ from the A NOVA- IIDMR components in (13,55): For example, the 
value /o = 0 differs greatly from the actual mean /& = = However, the representa- 
1 ion 


3 

f(q) = /o + £ ft (<it) + £ hi (^i , <&) 

i= 1 Ki<j <3 


is Ht-dl exact . 


This illustrates the Inaccuracy in the statistical 


i r ih :r pr t : t.a t i on ( 15.15) 


that can occur when truncated cut- HDMR. expansions with a single reference point 


arc employed for highly nonlinear functions. 
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13-5-4 AN OVA- HD MR Based on Cut-HDMR Expansions 

It is illustrated in. Chapter 15 that ANOVA-HDMR expansion can be used to con- 
struct- Sobol indices for global sensitivity analysis. However, this comes at the cost 
of high- dimensional integrals, which necessitates random sampling techniques or 
representations based on Legendre polynomials. Cut-HDMR techniques are sig- 
nificantly more efficient but do not directly yield expectation or variance relations 
since they replace integration with single point evaluation. Here we construct an 
ANOVA-HDMR expansion based on cut- HD MR terms which can be used for global 
sensitivity analysis. To simplify the discussion, we consider uniform densities on the 
unit hypercube T = |G. l] Ji . The extension to iid random variables with general den- 
sities is illustrated in Section 15. L 2. 

For an arbitrary parameter q = [<yi . , , , , gfp ] 1 and anchor point q = [q-\ , . . . , g p ] j > 
the second-order AN OVA- IID MR. based on the cut- IID MR. expansion 

f™%) = h Cvi + Y J }r i (<k)^ £ (13.64) 

1 


^/ r A ( ft )+ £ m 

k- I i < k <f< p 


( r Ik >'/()•. (13.65) 


where 


= jr‘\q)dq. 


f A W O VA / . \ 

Jk W 


f - fo 


AN OVA 


r*- 1 


ftr vA (^ «) = f - /r ovA (wb) - /r ovA (^> - /r ovA - 

J rv-- 

To facilitate implementation, we employ the relations (13,62) to represent the first- 
and second- order cut-HDMR functions in terms of Lagrange expansions. We also 
employ the notation 

si (</) = £ /r* (ft) , Si { q ) - £ <u ) 

;=i j <-a <j : < j-> 


to simplify the discussion of these sums. 

The constant A NOVA term /q = E(V) has the representation 


rANOVA r Ctit . r . r 

Ja ~ Jo + h + J2i 


where 


h = / ^ [q)dq 


5Z [ /fl r % 

i=l L m=l 


(13-66) 
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involves approximation of p 1-D integrals and 

h= ^ 

requires 3-D quadrature. This can be accomplished using Gauss Legendre quadra- 
ture techniques where the points q) i! are chosen to be the roots of the Legendre 
polynomials. 

The first-order terms are 


/r ovA (ft) - / [fir 1 + (?) + 5 2 ( 9 )] d q ^ k - /,r 1 av * 


rr- 1 


rp ] 


It is established in Exercise 13.6 that 


[Si(q) ± S 2 (q)]dq r .^ - / [S\(q) I 5 3 (g)]dg, 


- / ^^“/^(ft)- / (13-S7) 


cut 


./ r ■ ■ - l 


jo 


and 


j r j> i 


S*(q)dq^k ~ / Si{q)dq = V 


L <t<te 


I *\ 


fik 1 (fli > ) % - / / / /r f ( > f lk ) % 


rut 


L JtJ 


0 JO 


+ Y 


- / / fhj*{qk,Vj}dqkdqj 


1 H 




L J u 


0 JO 


(13-68) 


ho thal 


/r civA {ft) = /r i {w) - / /r^?*)^* 

jo 

+ 51 / - / [ /■*“*(?!>?*)%<% 
. .. Jfl -/ 1 1 in 


1 <i<fc 




I 11 1 


0 JO 
I H 


(13.69) 


yf (?* - - / / /£“ (?*, ?* 





JO JO 


Similarly, it is established in Exercise 13 . 7 that 


nr ,VA (?* , ft) = /£**(?* > ft) - / /£ ut (ft , ?*)<% 


■ cut 




1 rl 


(13.70) 




Jn 



fhf'kk, qt)dqhdqt. 


JO J.) 


The relations (13.66)* (13.69). and (13.70) can be employed in the manner described 
in Chapter 15 to construct Sobol global sensitivity Indices. 
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Remark 13# 1.0* The accuracy of the expansion (13,05) is limited "by the accuracy 
of the cut- HI )MR (1 3.G4). Hence while it is advantageous for constructing the Sobol 
global sensitivity indices < liscussed in Chapter 15 1 it will not provide the full accuracy 
attained using /□, fiiqi), and fijiq-i, q/) given by (13,54) or l he Legendre polynomial 
representations detailed in Section 13.5.2, I ’bis is Illustrated In Exercise 13, ft,. 


13.6 Surrogate- Based Bayesian Model Calibration 

In Chapter ft, we detailed techniques for constructing input densities through either 
the direct application of Bayes’ relation 


, , , 


(13.71) 


where ^"o (</)^ and 7r(</|t;) respectively denote the prior density! likelihood, and 

posterior density, or by constructing Markov chains whose stationary distribution 
is the posterior density. Based on the assumption that measurement errors are lid 
and normally distributed, £* ~ A r ((), t*). we employ the likelihood function 


7r(i'|g) = L((j,<r 2 \v) 


(2lTG 2 ) n / 2 


_ p ~SSj2a 2 

*1 


(13,72) 


where 

n 

SS q = Y [i* - fMf (13.73) 

T= 1 


is the sum of squares error. Here t 1 ,- and fi(q) respectively denote the data and 
corresponding model response; i,e. t fi(q) = ; f(U* ff) or fiW) ; ; /(#*,#) fr> r evolution- 
ary or stationary processes. The often insurmountable difficulty associated with 
approximating the integral in (13.71) or miming sufficiently long MCMG chains 
is the computational expense associated with constructing the required number of 
model solutions fi(q) for complex, high-fidelity models such as multiphysics or mul- 
tidimensional nonlinear PDEs, Here wc discuss the use of surrogate models for 
Bayesian model calibration* 

The most direct approach is to replace high-fidelity evaluations fi(q) with 


the highly efficient- surrogate evaluations fi(q) and employ the surrogate likelihood 
function 

1 cc )ni^2 


Sf(u|g) = L{q, <t 2 \v) -- 


(2jTr t 2 ) 11 ?' 2 


-SSq/ 2(7- 


(13.7-1) 


where 

Ti 

SS, } - Y [u, - f, (q) 

t-1 


(13.75) 


In this case, the surrogate samples from the prior in a manner analogous to that 
discussed in Chapter 10 for sampling from the posterior to propagate input uncer- 
tainties through the model. Furthermore, sampling from the prior is typically easier 




13.7. Notes and References 


299 


than sampling from the posterior in the sense that the assumption of mutually in- 
dependent parameter is always satisfied when using non info rmative priors defined 
on the admissible parameter space. Hence one can represent the joint prior as 

v 

*□(<?) = Y\ ^ 

1=1 

where it\\ i is the marginal prior density for Qi . As detailed in Section 5.2, this is in 
contrast to the posterior density, which typically cannot be represented as a product 
of the marginal densities due to correlation among parameters. 

This approach is detailed in the context of response surface models in 20] and 
Gaussian process models in [134], Convergence analysis and examples illustrating 
the use of stochastic Galerkin and collocation representations for efficient Bayesian 
model calibration are reported in [ 1 GT . 168], 

To illustrate, consider the reduced- order model 

Jr- 

fi(q) = u J ’ (U, Xi, q) = ^ fa ) (13.76) 

j'-l 

given by (13,30). Once ihe coefficients ^(t) have been determined by solving 
(13.29), one can evaluate the surrogate likelihood (13,74) for quadrature values 
q r required to evaluate (13,71) or chain values q k generated by MCMC algorithms. 
This permits significant sampling with a computational cost that is a fraction of 
that required for the high-fidelity simulations of ( 13.28). 

13,7 Notes and References 

The discussion of regression or interpolation-based models is necessarily brief, and 
readers are referred to [39 1 67, 236] for details regarding the statistical theory of 
response surface functions and kriging models and [84, 85, 89] for discussion illus- 
trating the use of surrogate models for engineering design and optimization. Further 
details regarding the construction of surrogate models based on Gaussian process 
representat ions can be found, in [199], Thu use of response surface models, Gaussian 
process models* and stochastic collocation techniques to construct surrogate likeli- 
hoods for Bayesian model calibration is detailed ill [20, 45, 108, 133, 134, 168, 198]. 
Software for constructing Gaussian process and kriging models is available in the 
Saudi a package DAKOTA |i, 5 . Response surface models can also be constructed 
using the MATLAB Surrogate Modeling (SUMO) toolbox. 

There is a large literature on POD represent ations for control applications, and 
readers are referred to 1 7. 140. 14’J. L6D, 161. L 95. 203, 257 and the included ref- 
erences for details and error analysis regarding POD and adaptive POD. A general 
discussion of POD is provided in [142], The more recent use of POD representations 
to facilitate surrogate likelihood implementation and uncertainty propagation is il- 
lustrated in [46, 87, SS, 154]. This includes the use of greedy sampling algorithms 
to adequately cover large-dimensional sample spaces. 
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IIDMR expansions have proven advantageous both as surrogate models 3, &, 
J 52. 1 53 j 1 02. 200 , 2{) 1 j and for the ft tn ict i ire they p rov 5 < le f o i i g I o bal sens it i vi r.y 
analysis lifting variance-based Sobol indices 177. 215. 217|. We detail this latter 
role in Chapter 15. 


13.8 Exercises 

Exercise 13.1. For R : L' : [77) L'{V) given by (13.38) and ip expanded in the 

manner (13.36), show that 

1 ?£ 

{Rtb, ^) = -V| fa, d>) | 2 

J i— 1 


and 

\R4\ # = {<t>, R$) 

for all o., D E L 2 (V), This establishes that R is a nonnegative and symmetric 
operator on L 2 (27 ) r 

Exercise 13,2, For the basis functions tzj', defined in (13.11), show that 

(<> r j, f K) = Sj rft. 

This establishes that they constitute an orthononnal basis set. 

Exercise 13.3, Ibr f(q) — sing* fa sin 2 + sin q\ considered in Example 13.8. 
use the relations (13.54) to establish the relations (13.55) for the A NOVA- IIDMR 
inter action terms. 


Exorcise 13.4, Show that the terms /o, and fij (qi , qj ) constructed in Exer- 

cise 13.3 are orthogonal. 


Exercise 13 + 5, Consider the spring mod*. 

tfz dz 




4- C—rr + kZ — /o CQS(i*Jjrf ), 


dfl dt 


3(0) = *0 , |(0) = *i 

with the parameters Q . As illustrated in Examples 3.2 and 9.7, the 

response 

, 1 
y I'-jJ' f. ■ . Q ) — . 

\/Jk — mui^y 2 I (a.L ) F ) 2 

quantifies the relative magnitude of the displacement as a function of the driving 
frequency ,.jfr In Section 1 0.1, we illustrated the use of the discrete projection 
method to propagate means and variances in the response. 

Show that the R8-HDMR techniques of Section 13.5,2 yield the same relations 

as discrete projection when Legendre polynomials are used to represent the first.- 
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and second-order Interactions. The only difference Is the ordering of basis functions 
used to construct the multinomial basis functions. 


Exercise 13,6, Establish the relations (13,67) and (13,68) for / r ., : 6\ (qjdq^k 
J f S\ (q)dq and / pp , S 2 iq)dq..,_. t - j p S 2 {q)dq. 

Exertd.se 1 3 + 7, Es t abl ish tl ie relat ioi i (13,70) for ° v A (g* , { U ) ■ 

Exercise 13,8, Use the expressions (13.66), (13.69), and (13,70) to compute the 
A NOVA* H D M R cot is p oi units f >i the fv i net ion 

/ Uh , <12, qs) ~ sinf/i + usiir q> + hql sin qi 

based oil the cui-HDMR components (13.63). Note that they differ front [.he true 
A NOVA- H DM R. components (13.o5) and hence will be inaccurate if nsec! for global 
sensitivity analysis based on Sobol indices. This illustrates Remark 13.10. which 
notes that the accuracy of A NOVA components computed in t-liis manner will be 
limited by the accuracy of the cut-IIDMR terms. 
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Local Sensitivity Analysis 


The term sensitivity analysis has different connotations in various modeling com- 
munities, and, even in mathematics, its meaning has evolved significantly in the last 
25 years. The objective of sensitivity analysis can he broadly viewed as quantifying 
the relative contributions due to individual parameters or inputs and determining 
how variations in parameters affect measured responses. The reasons for sensitivity 
analysis include the following: 

* ascertain whether the model is robust or overly fragile with regard to various 
parameters; 

* determine whether the model can be simplified by fixing insensitive parame- 
1 ers; 

* specify regimes in the parameter space that optimally impact responses or 
the i r i uncertainties ; 

* guide experimental design to determine measurement regimes that have the 
greatest impact on parameter or response sensitivity. 

The methods for sensitivity analysis are typically classified as local or global. 

D efi ii it io n 14, 1 ( Lo c a 1 Sc ns i t i v:i ty A naly s is ) , Loca 1. sensi ti vi ty analysis focuses 

on the variability of the response when parameters or inputs arc perturbed about 

a nominal value. This is typically achieved, and often defined, by the derivative 
d 

fj-L of the response with respect to the individual parameters. This technique is 
often employed to determine insensitive parameters that can be fixed in models 
since their variation minimally influences outputs. Local sensitivity analysis is also 
central to optimization, adjoint methods, and model calibration. Whereas local 
theory constitutes the majority of sensitivity analysis in the literature, the examples 
in Chapter 15 illustrate limitations of this approach when investigating the global 
behavior of non! [nearly parameterized problems. 

There are two differing concepts of global sensitivity analysis. The first, which 
is detailed in [50] , focuses on determining all critical points for a system — e,g,, 
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bifurcations — and performing local sensitivity analysis about these points. The 
second is more statistical in nature and focuses on the relation between input and 
output uncertainties for a model. Due to its direct ties to uncertainty quantification, 
we focus on the latter. 

Definition 14,2 (Global Sensitivity Analysis), One objective of global sensi- 
tivity analysis is to ascertain how uncertainty in model outputs can be apportioned 
to uncertainties in model inputs, laken either singly or in combination, when con- 
sidered over the entire range of input values. This analysis is focused solely on 
properties of the model and does not rely on experimental data. Hence global sen- 
sitivity analysis of this form complements uncertainty quantification, which focuses 
on the determination of input and response distributions or confidence intervals 
based on measured data. As detailed in Chapter 15. global sensitivity analysis 
techniques can be broadly categorized as regression, variance, or screening- based 
methods. 


We employed local sensitivity analysis for several facets of model calibration 
and uncertainty quantification. In Chapter 7, we illustrated that the sensitivity 
matrix was required t.cj construct the lusher information matrix for non linearly pa- 
rameterized models. Similar analysis in Chapter 8 illustrated the role played by 
the sensitivity equations for formulating the covariance matrix used to construct 
initial proposal densities for M CMC routines, It was illustrated in Chapter 9 that, 
the product of the sensitivity matrix, covariance matrix, and transpose of the sen- 
sitivity matrix provided the response covariance matrix for linear perl ur ballons. 
Local sensitivity analysis is also central i o the parameter selection techniques of 
Chapter EL 

In Section 7.3.1, we noted three techniques that could be used to construct 
local sensitivities: finite difference approximations, solution of sensitivity equations, 
or automatic differentiation. The following example illustrates these techniques and 
motivates the more general local sensitivity approach detailed in this chapter . 

Example l4 + 3* Consider the nonlinear spring model 


(1.4.1) 


with the responses 


d 2 z ... / dz\ 

dP +C ill) +K 


CO) = 2 , — (0) = 0 


y{t) = ii o] r * 1 - m 


(1 1 - 2 ) 


y - / 7 (t)z{t)dt. 


(11.3) 


For the latter ease, 7 acts as a filter or weight over the time interval of interest. 
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For the response (14,2), the parameters are q - A . C], To const ruct sensitiv- 
ity equations, one different iaies (14.1 ) with respect to each parameter and switches 
the order of integration to obtain 


£ 


? : K 


dz d 


+ 2C ^i^± + Kzk . 


dt 2 
zc 


if 


dt 2 


+ 2C 


dt dt 
j dz d 


— z 


i CO) = 0, ^7™(0) - 0, 


zc 


dt 

dzc 


dt 


+ **--(§)' , scW -o. clt 


( 14 . 4 ) 


(0) - 0, 


Cl _i ^ 

where z^{t.) = and z< 7 (jt) = Because there is no analytic solution, one 

must use numerical routines to approximate 

and ^ are then given by (14.2). 


and z c (t) = 


(f), zk (i'L and zc ( t ) . The respoi lse 


set is] t l v 1 1 tes 


Alternatively, one can approximate the response sensitivities using the finite 
di ffere ■] lcc rela tion s 


dy_ 

OK 


z(t, K T h k , C ) - z(t, K, C) dy z(t K\ ( ' 4- he) - z{L K, C ) 




h K 1 0C y y h c 

which are specific cases of (7.3ft), As illustrated in Exercise 1 7.2 r the accuracy of 
this approach is highly dependent on the choices of and ftQ, which must be 
scaled according to the magnitude of parameters. Hence while this approach is the 
simplest conceptually* it Is often the least effective. 

For evolution problems of this type, automatic differentiation (AD) software 
constitutes a third option for computing sensitivities. This approach exploits the 
fact that all computer programs can be decomposed into a combination of el- 
ementary arithmetic operations — e.g. ; addition, subtraction, multiplication, and 
division and function evaluations e.g., exponentials, logarithms, cosines., and 
sines. Automatic differentiation (AD) software computes derivatives of arbitrary 
order by applying the chain rule to these operations. AD routines have been devel- 
oped for MAT LAB.. C/C " ' , Fortran, and Python and have been incorporated into 
packages such as TViliuos, We emphasize that automatic differentiation is funda- 
mentally different from symbolic differentiation, where algorithms manipulate the 
i natl loi na tical mo r lei exp ressions . 

Now consider the model (14-1) with the response (14.3) and parameter se> 
q \K^ 0, 7(i)] ■ Sensitivity equation and AD techniques cannot he directly em- 
ployed to construct and finite difference approximations will have limited accu- 
racy. Furthermore, these techniques do not provide the capability for quantifying 
the sensitivity of y when all three parameters are simultaneously varied. This moti- 
vates the more general local sensitivity analysis techniques detailed in this chapter. 


We discuss techniques, based on the Gateaux variations, for sensitivity analysis 
that are applicable to evolution and stationary differential equations, integral equa- 
tions, and algebraic systems. lAirthermore, the techniques can be used to quantify 
the local response sensitivity when multiple inputs are perturbed simultaneously. 

To simplify the analysis, we focus on systems that exhibit a linear state de- 
pendence which permits the decoupling of state and adjoint equations. As noted in 
Section 14.3, similar analysis can be applied to nonlinear state-dependent systems, 
but the coupling between state and adjoint equations complicates their solution. 
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To motivate attributes and issues associated with the forward sensitivity anal* 
ysis procedure ■; FSAP) and adjoint sensitivity analysis procedure (ASAP), we con- 
sider the neutron transport models detailed in Example 3.6. In Section 14.2. we 
summarize and illustrate the functional analytic theory of [50], which is applicable 
to a broad class of linear and nonlinear state*dependent models. 


14.1 Motivating Examples — Neutron Diffusion 

To illustrate the FSAP and ASAP, we consider the neutron diffusion equat ion 

A< t (p-D—z=S , 2 r€(-a,a), 

oar ( ] 

^?(±a) — 0 


with a response 


y = A&p{b) 


(14.6) 


measured using a detector of width located at x = b r As detailed in Exam- 
ple 3.6, f? n and S respectively denote the neutron flux, a diffusion coefficient, 
a macroscopic absorption cross-section, and a constant distributed source. The 
parameter set is 

q= f4 a .D,5,.4 d ], (14,7) 

and it is assumed that we know nominal values r/ and variations fiq. For example, 
one might take q to be mean values and Sq to be one standard deviation For each of 
the components. Other choices are possible, so, throughout this chapter q should 
be interpreted as nominal rather than solely as mean values of q. For the nominal 
parameter values, the solution to (14.6) is 






cosh A:) 
< 'Osh (aft) 


k — \j A a /D, 


(14.8) 


14.1.1 Matrix System 

We illustrated in Example 3.6 that a central difference Taylor approximation for 
the second derivative yields the observed matrix system 

Mq)<f>= s(q), 

y = C‘ (i q)<t> = C 1 (q)A~ 1 (r/M</), 

where 6 = [^j , . . . , <pjf ip with ^ ~ (p(a-j) and ,4(cj),C(rj). »(<]) are given in (3,26) 
and (3,32). 'The mean parameter values q yield nominal values .4 Afr/), s ( ^7 ) t 
and C — C(qf) as well as the nominal solution 0 = A~ l 3 and response y — C r <p. 
Furthermore, parameter variations Sq produce corresponding variations 5 A, d".s\ SC, 
50, and Sy in t lie system components, solution, and response. 

We i low illustrate the FSAP and ASAP to quantify the effect of the parameter 
variations Sq on perturbations <50 and Sy in the solution and response. 
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Forward Sensitivity Analysis Procedure (FSAP) 


From 


yiq) = c' (q)A '(q)*{q). 


we can formally express Sy as 


Oy , Oy . Oy .. Oy , 

Sy - ~-SA u + 4 ^*5 4- ^5^,. 


3A* 


3D 


9S 


3,1, 


The difficulty is that computation of the sensitivities requires differentiation of 

A ~ 1 (//) , which typically does not have a closed- form representation. 

Instead , we use the Gateaux variation t defined in Appendix A. to directly 
construct forward sensitivity equations and response perturbations fiy. We recall 
that the Gateaux variation is a functional analytic generalization of the directional 
derivative from multivariable calculus. Hence it quantifies responses of a functional , 
evaluated at a point in the vector space, in various directions. 

From Definition A.T. the Gateaux variation of y(q) — C 1 Ui)y>{q) ~ 0, evaluated 
at the nominal parameter and response values q and y t is 


| jZ f 1 ^Sy) — (C T I E&C T )((f> 4 J = 0. 


( 143 )) 


which yields the response perturbation relation 


Sy — C ^ do I SC J o. 


( 11 . 10 ) 


The terms C and SC can be computed from (3.32) using known values for Ad 
and SAfi. Similarly, tp - A~ 1 (q)x(i]) can be computed using nominal parameter 
values. To compute S<p 7 we apply the Gateaux variation to A(</)d> - s(q) = 0 to 
obtain 

[(A - zSA){<f> 4 £$<P) - (s 4- zSs}] ) = 0. 

™ ) € =o 

This yields the sensitivity equations 


A5<f> = 5 s — SAffi {14.11) 

that are solved to obtain 5d>. In summary, one solves the nominal system A<f> = 3 
and sensitivity system (14-11) in the FSAF to obtain and Sd and hence compute 
the perturbation response (14. 1 0). 

The disax 1 vantage of this approach is that (14.11) must be re-sol ve< l to accom- 
modate information resulting from new data. For example, changes in SD,5S,SA a , 
or 5A a , due to additional measurements or furt her Bayesian analysis, produce corre- 
sponding changes in SA and 5s. For large systems, the repeated solution of (14.11), 
to Incorporate new information, is often prohibitive and is avoided in the ASAP, 
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Adjoin I, Sensitivity Analysis Procedure (ASAP): Perturbation Approach 

We will illustrate ti ie ASAP from two perspectives. For this example, the 
first is more fundamental, and it is in the spirit of the general functional analytic 
approach summarized in Section 14,2. The second illustrates the variational per- 
spective 3 which, for general problems, requires fewer results from functional analysis 
and can provide a more natural framework for incorporating constraints.. 

We begin by defining the adjoint sensitivity equation 

A T y = C, (14.12) 

where V- is termed the adjoint function and A T is the transpose of ,4, We will 
] 1 1 oti vate (14. 12) i 1 1 subseque nt d isci issi 01 1 . 

We then consider the dot or inner product 

( 0 , Sx — 8 A0 ) — •(' 0 j AS cp) (14- 1 3 ) 

obtained by multiplying ( I 1.11) by 0 7 ■ From the bilinear identity 

ij 1 fTl 

{v, z) — V Z — Z V — {z. v ) , 

it follows that 

(0, A54>) = ^ AS(p = S6 1 A s i.- = A r 0) . (1444) 

r I lie comb 5 n at ion of (14.12) (14.14) vie Ids 

(0 , Ss — SA<j > ) = {#0, C) =C ! £0, (14J.5) 

The final term is precisely the component of the response perturbation (1440) that 
is unknown, so wc substitute (14.15) into that relation to obtain 

% = SC t 4> + 4' 1 [is - $A<}] . (14. 16) 

To employ (14,16), one must solve the adjoint equation ( 14.12) for 0, but only 
once since it depends on nominal parameter values. Variations Sy dm?, to updated 
values of SC. Sj . and 8 A are then easily computed since (14.16) requires only vector 
multiplication rather than re-solving linear systems. 

Multiple Responses ja For this example, y = C & is a single response. In 
general, wo will have u responses or measurements. Note that this can be achieved 
by specifying a v x (N 1) observation matrix C ! , The perturbation response 

. I I — .m ... rj-i 

relation (14.10) is unchanged, and the product of the i m rows of C J and SC' with 
Sp and 0 specify the i tri row \Sy For the FSAP, 1 lie sensitivity equations (14.11) 
used to specify 50 are also unchanged, and p parameter updates necessitate that 
(14,11) be solved p times. 

For the ASAP, the transposed adjoint solution 0 7 is now a v x (A — 1) matrix 
whose A" row is tin - solution of 

A f V = C u 
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where C-i is the column of C. Hence the determination of SR. using (14.10) 
requires the solution of & linear systems. 

In conclusion, for linear or quasi-linear systems in which adjoints can be ef- 
ficiently constructed or approximated, the ASAP is more efficient than the FSAP 
when the number of parameters p exceeds the number of responses re 


Adjoint Sensitivity Analysis Procedure (ASAP): Variational Approach 


Response perturbations based on adjoint solutions can also be constructed 
using a variational approach. As illustrated in subsequent examples, this approach 
requires less functional analysis and will 1 ie natural to readers familiar with vari- 
ai ioi t al c alcul i is , o p ti inal co i if rol , or sensi t i vi ty-l jaset 1 < 1 csign , Fur (1 lcn m >ro , i t can 
prove advantageous for problems that, involve multiple constraints. 

For Ad) = 3 with the response y = C 7 ^, we consider the augmented or uncon- 
strained response functional 


v = y- i> T [Atp - * , 

= [C T — ^ r /l] <P l ip T 8. 


(14.17) 


where % 1 s is a Lagrange multiplier. Taking Gateaux variations of (14.17) about 
nominal values A, Ch and s, in the manner illustrated in (14,9), yields the perturba- 
tions 

Sy = $C T (i* — [5 Ad* — its] — §y) T [A6 — *) + (C T — A) S&- 

To eliminate the dependence on solutions variations we enforce the relations 




which is the transpose of the adjoint sensitivity equation (14,12), We then employ 
the exact solul ion 0 = A 1 $ to obtain 

Sy — 6C r <p + ip T <5s — A_4cpJ . 

Because we use the exact solution </>, it follows that Sy = Sy, as illustrated through 
con i par hon w i tli (14.1 6), 


14.1.2 Boundary Value Problem 

We now return to the boundary value problem (14.5), which has the solution (14,8) 
and response ( 1 4.6). The Gateaux variation of the response \s 

Sy = 5A,rp{b) I AdSip(b), 

wl ier e 5 Ad , ip (6) , anr 1 A d are know 1 1 a nd 6tp(b) is u iiknow n . 
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Forward Sensitivity Analysis Procedure (FSAP) 

To specify variations Sip in the solution, we need to construct 
sensitivity equations. We take Gateaux variations 

{it. I ' /1 " f e&A a )(ip \ e Sup) - (D I e(5D)-Cj{^ I eSp) - ( S \ eSS) 


the forward 


= o. 


j — {<p + crtV)(±a}| — 0 

of the system (14,5) to obtain the sensitivity equations 

- - d 2 S<p __ . _ __ i fi ip 

A a Stp — D - ,, = SS — 5A a <p - dD— 
fLr, ax> ' 

Jy?(±a) = 0. 

It is shown in [5Uj that the solution to (14.18) is 
&{p(x ) = CJ ] [corf i ( rr-/.: ) — COS 1 1 ( dk) \ 

4 C '2 far sinh(^Ar) r:osh(riAr) — tisinh(afe) cosh (#£:)], 


(14.18) 


where 


SAlS/A* - SS _ {SD/D - SAJAJS , _ fTTR 

1 /£« cosh O'?) ’ 2 2 oosh 2 (afc) tJ~DA^ ' ' V ’ 


We note that, in general, one cannot construct analytic solutions to the sensitivity 
equations and instead must use numerical approximations. For p parameters, this 
would require p numerical solutions to incorporate variations for each parameter. 


Adjoint Sensitivity Analysis Procedure (ASAP): Variational Approach 


We consider the augmented response functional 


a = v + 


d' 2 - 

A u tp — A>— 4- — S 1 pdx., 
d:r. 


( 14 . 19 ) 


where v € L~( a. a) is a Lagrange multiplier. Since y can be expressed as 


y = Ad^(h) = I Ad<fS(z - b)dx, 

J — a 


(14.20) 


where S{x — b) is the Dirac density evaluated at it follows that 

f a r " 

y= j ifO- dz, 


where 


is the Hamiltonian, 


W-itpi q, p) AdipS(z b) +■ [A a \p - S]t? 


( 11 . 21 ) 


(14.22) 
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We take i he Gateaux variation of (14.21 ) to obtain 


5y= [HtpSp + 7i q Sq + (W^ — - ifup^SD - ZJqirfft/')] efcr 

— a 

= f [Wv - W)")5<p + W q 5<! + (W* - D<p')ty - W$D] dx 

J —a 

- D^m’\ a _ a + /VM!.. 


where are partial Gateaux derivatives with respect- to y? 3 g, evaluated 

at nominal parameter and solution values, and wo have used the property that 
— (5^)'". as developed in Exercise J 4.2. We note that because ^(±fi.) = 0, 
variations in the solution also satisfy 6 ^j(±g) = 0, so the final term vanishes, Second, 
we lot O satisfy the adjoint boundary value problem 


H, fi - Dip" = a 
— 0 


D- A a $ = A d 5(x - 4) 

^(±a.) = (1 


(14.23) 


and specify \p as a solution of D^ f = 'H i: . : to obtain 


,-a 


Sy - / [H q Sq - fp tr ip5D\ dx 


r [U A JA a I (H d - <p"$)6D 

J —a 

<f>(b)5A,{ \ ( {6A a tp — 5S — 6 Dip" ) vdx. 


Ua.au i ujsyx 


(14,24) 


— a 


We note that once (14.23) has been solved for t?, solutions to (14.24) can lie effi- 
ciently specified in terms of the solution ^ given by (J4.8) and specified parameter 
vari at ions tf q = 5 A aj SD,SS,SAd\, 

We caution the reader that care must be exercised if employing the integral 
response representation (14.20) in the functional analytic framework of Section 1 4.2. 
In that framework; the RiesE represental ion theory is invoked to represent bounded 
linear functionals in terms of the Hilbert space inner products. The difficulty 5s 
that 6 is not bounded in L~ but rather in the dual space H 1 of H ,! . Since H t \ ^4 
L 1 II 1 is a Gelfand triple with dense and compact embeddings 265], density 
arguments and use of the duality product, rather than the L‘ : inner product, can be 
used to formulate the response so that it rigorously fits in the functional analytic 
framework . 
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14.2 Functional Analytic Framework for FSAP and 
ASAP 

We summarize here the functional analytic FSAP and ASAP framework of [50 J for 
the general linear model 


LU/)?i - /■ \t 

n(<i)u - £?(?) 


x e a 
v e tel 


(11.25) 


discussed in Section 3,3.1. Potential spatial or temporal dependence of 1 lie param- 
eters </(x) : : [ftW,. • - 1 <J P (x}] T and slates w(x) : : [«i(x)> - - .,'«ffW] T is indicated 
by y [j:, t | e JxT = < 2 , where J is a subset of K 1 . R’ J . or JR 4 and T is a subset of 

R 1 . Here L{q) = [L,(?) L N (g)] r is a vector of operators that depend linearly 

on n and typically nonlinearly oil q and G(q), B(q ) arc operators associated with 
initial or boundary conditions. The response or observation is represented b\ 


y — R.i ti. q) = 7?,(o) l e — [u, </] 


(14.26) 


in the Hilbert space II = II u x H q , where H u and H q = Q are state and parameter 
spares. Similarly, the sources F are assumed to be elements in the Hilbert space 
Hy. For differential operators, dom(L) is typically defined to he a dense subspace 
of H u . Examples of the operators and spaces for the models in Section 3.1 are 
provided in Section 3.3 J. 

As in Section 11.1. we assume that we know nominal (typically mean) pa- 
rameter values q and variations or perturbations 5y = h, r Nominal solution values 
u are computed by solving (14.23) with y — y- Perturbations of the solution are 
denoted by fiu — and the combined perturbations of e about e are denoted by 
h= ftrj] . 


Response Perturbations 

In general, the Gateaux variation 


emuh) ~ Htn ™?t ek )-ML = ±m > ' «*) 

i ~~ frO c ils 


= 0 


need not exhibit a linear dependence on h and hence To facilitate the discussion, 
we assume that, 6 JZ satisfies the necessary and sufficient linearity and continuity 
oondit ions det ailed in [50], so il can be expressed as the Gateaux differential 


h) - K(e)K + ft' (14-27) 

where 7T, (e) and 7£^(e) denote the Gateaux partial derivatives with respect to u 
and q. The term TZ { i{ {e)k q depends on known perturbations h ff and is thus termed 
the direct effect. Because depends on solution variations h. u . = Sv., which 

have yet to be computed, it is termed the indirect effect The FSAP and ASAP 
represent two techniques to compute h u , 
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Forward Sensitivity Analysis Procedure (FSAP) 

To construct sensitivity equations specifying h n , we take the Gateaux varia- 
tions of (14.25) at e in the direction h to obtain 


() = 


— [I'(y H- chg)(ti -F c h u ) — ( F + 

J t?~ o 

= { L(q + £h q )hu. + L' q {q + sh 9 )(u + ehu)h q - 5F(q\ &,)},_ 
= L{q)hi, I 


e=0 


where L^(g) denotes the partial Gateaux derivative of L at q. Similar treatment of 
the boundary conditions and enforcement of the stationary condition for Gateaux 
variations yield the forward sensitivity equations 


i.{q)h u \ L^ulh^ ^ 6F(q\h q ) . * £ ft, 
B(q)k v + [B' Q (q)u]k q - SO(q-,h q ) . X £ dil. 


It was illustrated in the examples of Section 14,1 that whereas the solution of these 
equations yields the sensitivities h v = Su necessary to compute the indirect effect 
7£j r (c)fi- u> they must be resolved for each computed or updated parameter response 
qi, i = 1; „ , , t p + If the number of parameters p is larger than the number of responses 
v. it is more efficient to employ the ASAP, provided the adjoints can be constructed 
or approximated in an efficient, manner. 


Adjoint Sensitivity Analysis Procedure (ASAP) 

We consider the case when there is a single observation so that i/ = 1 and 7?. is 
a functional. We refer the reader to [50] for the general case of nonlinear response 
operators. 

The Riesz Representation Theorem A ,5 guarantees that every bounded linear 
functional in a Hilbert space can be uniquely represented in terms of the inner 
product. Since 7Z f , f (F:)h tI in linear with regard to solution perturbations h Ul it thus 
follows that there exists a unique element V u 7t{e) E H tl such that 

K(e)K^{VJZ(e)Ji u } u (14,29) 


for all k u E H. u . 

We know from the examples of Section 14d that the adjoint solution plays a 
fundamental role in the ASAP. Fran (A .4) of Appendix A s it follows that L and its 
formal adjoint L* are related by 

{L{q)h uy $) F - h. u ) w + F{h u , v)\<m> (14.30) 

where P(h u , ^ p )| jq is a bilinear form, evaluated at q. that arises through integra- 
tion by parts in the case of differential operators L, For unbounded operators 
(e.g ri differential operators), the specification of the domain, including boundary 
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conditions, is necessary to fully define the operator. For differential operators, we 
consider adjoint boundary com lit ions 

£" fa; q) = a* fa) , xeso, (14.3 l) 

that are const ructed to satisfy the following requirements. 

Adjoint Boundary and Initial Condition Requirements 

(i) The boundary conditions (14.31) do not include h u > h.. r or Gateaux derivatives 

with respect to r/, 

(ii) All terms containing the unknown variations h u = 5u must vanish when the 
sensitivity and adjoint boundary conditions are substituted into 

The boundary terms that remain are denoted by I y (h ir vi f ( /). 

We are now ready to complete the procedure. Since V ti 7£(f“) is unique, we 
impose the condition 

r(g)^ = V M K(e), (14.32) 

along with (14.31), to obtain an adjoint system and reduced boundary relations 
P ( h fJ , %? ; q) , By employ i i lg the sen si tivity cqu at ions (14,28), it fo 1 lows th at 

{L{<])h u , ip) F = - ([Z^fafa] V p) F + { 6F fa; h q ),ip) F , (14.33) 

where the right-hand side is independent of /i u = £ti. Combination of (14.29), 
(14.30), (14.32). and (14.33) thus yields 

K' u (e)h., = {SF(q h q ) - [ L' q {qy,\h q ^) p - ?{h qi «>; q) (14.34) 

so that, response pei tur bat ions are 

HK(r~\ h) = 7Z’ q (e)h q \ ($F(q; h q ) - \L r q (ij)u}h ll , it} p - F{h q , q). (14.35) 

Once has been determined by solving (14,32), all of the terms in (14.35) arc 
known so that response uncertainties 5'R can be computed directly. 

Wc now illustrate the functional analytic FSAP and ASAP in examples, 

14.2.1 Neutron Diffusion — Matrix Systems 

We revisit the model of Section 14.1.1 that had the response 

y = K((p,q) = C T (q)<p 
subject to the matrix constraint 


A{q)d> = s{q). 


As detailed in Section 11.1.1 and Example A, 11, L 
example. Furthermore, u — c> and II p — R A with the 


A and L* = A r for 
; 11 dean dot product. 
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The relation (14,33) yields 

{A5<ft, p) = (6s — SA<p f v' ! ) 

since L(q)h. u = Attfjri and = A f (q)&q$ — 6A4>. This is precisely ( [4. 13). 

The adjoint constraint (14,32) yields 

A T i> = C, 

wl ] id l is (14. 1 2 ) . Pit j ally, t ] ie vesp o use ] icrtur bation relat i on (14.35) y ie I ds 

fry = 671 = SC 7 '} + ii> T [fo - $A(j}\ 

since there are no boundary terms P. This is exactly the response perturbation 
(14,16). We point out that these relations are also identical to those obtained using 
the variational adjoint approach - 


14.2,2 Spring Model 

To illustrate the functional analytic framework in the context of a differential op- 
erator, we revisit the spring model 


dt 2 


— 4- Kz = T 


dz 


(14,36) 


*(0) - ** , ^(0) = 

of Example 14,3 with the response or observation 


V = Kiz> <l) = / 7 {t)z(t)dt. 

Jo 


(14.37) 


Here u — z 3s considered in H IA — L 2 (Cl , ? j } with the standard inner product. 
The parameters are 


q — [AT, zoj z\^ 7] G ((hoc) x E x R x h J (G : fy). 
With i lie definitions 


(14.38) 


L(q)u - z + Kz , F(q) - 0 . B(q)u - 


, G(q) - 


*Q 

Z\ 


(14.39) 


the model can be posed in the general operator framework (14-27). Finally, we take 

Re s p o use 1 1 er t ur hat i o ns 

The Gateaux differential of (14.37), evaluated at 7(f) and z (/ ) s is 


•tf Pi 

^(t)8z \t)dt 4- / i$7(t)£ 


STL - 


(14.40) 
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T1 jc nominal Evolution is 


z(t) — Socoswoi + ’ — sinwoi, 

Wo 


(14.41) 


where wq K 1 - 2 ^ The goal in the FSAP and ASAP is to determine the solution 
perturbations Sz (t). 

Forward Sensitivity Analysis Procedure (FSAP) 

Taking Gateaux variations of (11.36) yields the sensitivity equations 


cPSz 


+ h'Sz = —SKz, 


(14,42) 


$z(°) = Szo , -rrr (0) = 6z\. 


Note that these relations can be obtained from (14.28) using the operator definitions 
(14,39) since 

fjPfig 

L{q)hv + — I- KSz + SKz. 

By employing the nominal solution z given by (14,41), we obtain the analytic sen- 
sitivity solution 

5z(t) — — — sinwgi + — ^ coswot SIC 4- cos ^tSzQ + — sincDQtffzi. (14.43) 
2uj[) 2;^ J wo 

This can be employed, in (14.40) to obtain the response perturbations or uncertain- 
ties. The individual sensitivities are thus 

dz zo h 

—— = - — isincJot I — -cosLOut, 

oK 2wq 

dz 

- — cos wo C 

UZq 

dz 1 

= — sin wn£, 

Oz ] wo 

which arc precisely the relations that result from differentiating (14,41) with respect 
to the parameters. 

Adjoint Sensitivity Analysis Procedure (ASAP): Variational Approach 

To form ail unconstrained response > we employ the augmented functional 


y = Tv = K(z. q) — f [i? + K z] dl 

J o 

rh 

- / z) - dt, 

Jo 
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where 1 the Hamiltonian is 


H(q, V, a) = (7 - 


We note that subtraction of the constraint facilitates comparison with the functional 
analytic approach. However, one can just as easily add t.he constraint which yields 
the same final result. 

The Gateaux variation is thus 


R = 


V. 2 Sz + H q Sq d - z)Sp - pS 


(H z ~ -F - z}5y 


cii 


di + 


— V# J + vtf 


(11,41) 


where £z(0) ■ = §zq, <Jz(Q) — are unkn own and H z , H qi are partial Gateaux 
derivatives of H with respect to jet, q, and t>. We have also used the property that 
7 ?F — ^^7 — ^ 2 . as developed in Exercise 14.1. 

To eliminate the unknown boundary terms at f / , we enforce the conditions 
yj(fy) = p( it) = 0 in the adjoint problem 


(14,45) 


p — H z = 0 ^ I Kijr = 7 

p(tf) = = 0 = 0, 

Furthermore, we employ the solution z to the state relation ;; — 'Ho = 0 to obtain 

rb 

(14.46) 


67Z j [zSj - zpfiK\dt 4- v(n)fei - . 

J 0 


Finally, we note that by employing the nominal solution z = z, it follows that 
6R = 6R, By comparing with (14,40), we see that 


ot. 


it 


I ^{t)Sz{t)dl — — j z(t)^(t)6Kdt l kfOjdz] — p ( [I ) &Z(\ 

J 0 Jo 

in the response perturbation, The advantage of the adjoint formulation ( 1 4.45) is 
that the adjoint problem { 14.45) must be solved only once, whereas Sz given by 
(14.43) must 1 >c recomputed for each change in dzu^Szi, or SK, 

Adjoint Sensitivity Analysis Procedure (ASAP): Perturbation Approach 

To illustrate the functional analytic approach, we note from the operator def- 
initions (14.39) that the adjoint relation (14.30) yields 




Sz I KSz ) 1 . di = 


r* • .* 


r - KP J Sz T 


\?6z — 06 


h 

0 


(14.47) 


j o' J jo 

To eliminate the unknown terms at if , wc enforce the adjoint boundary conditions 

v(t;) = = 0 
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indicated in operator lb mi at in (14,31), The remaining boundary terms are thus 

= — ^(o)^! + i>(o}£z Q . 

To specify the right-hand side of the adjoint- equation > we note that the indirect 
effect to the pevturbat ion response is 


Km* 



7 


so that the adjoint- system is 


$ + f< i ■ = 7, 

0 (tf) - 1 &( tf J - 0.. 


(14.48) 


By using the sensitivity equation (14,42) to replaee the left-hand side of (14.47), it 
fed lows tl jat 


so that 



7 (t) 6 z{t)dt 



6 K z(t)p(t)dt I ij?(Q)5zi — ^(0)^3q 


611 = 




z ipS K dt I v ( 0 ) Sz l — v ( □ ] 6-.^ s .-| , 


(1.4.49) 


which is precisely the relation (14. Id) obtained through variational analysis. We 
note that ( I 1.49) is given by (14.35) with 


rtf rtf 

R f ( .{e)h q - i [£■' (g)*/]/^, v) F = / d'yzdt - / SKz^dt. 

Jo Jo 

Remark 14.4., h is observed thai it 1 he response ~R- is posed in terms of the inner 
product for Il ]lt as is the case in this example, then the two approaches are actually 
the same. The difference arises for problems in which the Riesz representation the- 
orem must be invoked to represent response functionals, such as point evaluations, 
that are not initially posed in terms of the Hilbert space inner product . 


14.3 Notes and References 

The FSAP and perturbation approach to the ASAP follow closely the development 
in 50. 53], and the reader is referred to those references for additional details and 
applications of the techniques. To simplify the discussion, we have focused solely 
on linear problems where the adjoint problem can be solved independently from the 
forward problem. The reader is again referred to [50. 53] for sensitivity analysis 
of nonlinear systems where the forward and adjoint problems are coupled, which 
requires simultaneous solution techniques. 
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14.4 Exercises 

Exercise 14-1. Use the definition of tlie Gateaux variation to show 1.1 nat 
and that — <5(^)- 


Exereise 14.2. Consider the initial value problem 


dz 2 i 
— = az~ H- on, 
at 

£( 0 ) - Z 0 


and response 

y = IZ(zjU) g) — / |A:^ J (i) 4* rir{£)]d£. 

Jo 

whore and zq are scalars and q = [a, You can treat k and r as known, 

f i xed i 1( ?h ig 1 1 parai i h tfers ■ 

Determine the adjoint equation, along with an appropriate boundary condi- 
tion. and specify the response variation STi, 


Exercise 14.3. Exercises 7.1 7.5 illustrate additional facets of local sensitivity 
analysis. 





Chapter 15 


Global Sensitivity Analysis 


A* detailed in Definition 14.2. one objective of global sensitivity analysis is to quan- 
tify how uncertainties in model outputs can be apportioned to uncertainties in 
model inputs that are considered over the entire range of input values. Unlike local 
sensitivity analysis, where inputs are varied, about a nominal value, uncertainties 
due to combinations of parameters throughout the admissible parameter space are 
considered in global sensitivity analysis. In both cases, analysis focuses solely on 
properties of the model and does not rely on experimental data. The determina- 
tion of parameter and output uncertainties using experimental data constitutes the 
CO i n pie-mu: it a vy | iro cesses o f i n c >del cal i b rat i on a:i id m if :ert ai nt.y pro | legation - 

Global sensitivity analysis is often used to determine noninfluential parame- 
ters in nonli nearly parameterized models, which, can be fixed for subsequent model 
calibration or uncertainty propagation. Hence these techniques provide the basis 
for the parameter selection methods of Section (>.2. 


The differences between global and local sensitivity analysis and the manner 


in which global sensitivity is evaluated are illustrated in the next example. 


Example 15d. Consider the linear portfolio model 

Y = C]Qi ■ C 2 Q 2 ( 15 - 1 ) 

discussed in [215]. Here and Q 2 represent hedged portfolios and cj and Co are the 
amounts invested in each portfolio. For example, each portfolio could be comprised 
of options and stocks with associated risks. We initially assume that the average 
return from each portfolio is zero and that they are independent and normally 
distributed with ■ A 1 ( 0 . nr^) and q 2 where rx : — l and <13 — 3. In 

this example, we lake C\ and <>> to he constant with c\ - 2, c -2 : 1, The random 
variable Y is the return for the investment, although it is often termed the risk if 
it is negative. From (4.18). it follows that Y ^ 0,Oy), where 

ffy = {^<rf + = 13. (15.2) 

Since <72 > &\ ■ the second portfolio i& said to be more volatile than the first. 
To illustrate the effects of portfolio uncertainty on the return n scatterplot s obt ained 


■3 
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(a) (b) 


Figure 15,1. Scatter plots of y versus (a) q : and (b) qo constructed using 1000 joint 
realizations , 


with 1000 joint realizations r/i n q-. : n and y are plotted in figure 15.1. These plots 
Indicate that Q 2 has more influence on Y than Q L since the realisations (i qi,y) clearly 
reflect the trend of the model (15.1), whereas the realisations (qi^y) and (— qi, y) 
are nearly identical and there is no clear' trend for the nearly uniform scatterplot, 
EYom a global perspective , Y is thus considered more sensitive to Q 2 than Q } . 
lu contrast, the local derivative relation 





(IS. 3) 


vie 


s i =2 > — J 


which reflects the amounts invested in the two portfolios rather than the effects of 
their volatility on the return, I Jen re the local technique does not accommodate the 
nonlinear uncertainty structure over the global admissible parameter space Q = R“. 


Alternatively, one can employ the sigma-normalized relations 



n.f &Y tii 

-r — = ci , 

&y vQi Vy 


(l-j.4) 


which are hybrid local-global in nature since er* 
range of in pin values. Here 


incorporates variability over the 





( 15.5) 


which is consistent with the scatterplot information in Figure 15.1, From the defi- 
nition. it follows immediately tbal 


(St) 2 + (sff = i 
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so that each squared relation (Sf ) 2 quantifies the contribution of that individual 
factor to the variance of tlie output or Qol. We note that the relations (15.4 ) consti- 
tute one technique recommended for global sensitivity analysis by the 1999 and 2000 
Intergovernmental Panel for Climate Change (IPCCj 115]; see also Section 2.2, 

The model (15.1) is monotone and additive in the sense defined next. 
Definition 15,2, A model Y = f(Q\ Q, ) is additive if it can be expressed as 

Y = EL, MQih 

We now discuss the construction of global sensitivity measures that can be 
applied to nonmoiiotuiie models or models with parameter interactions. 


15.1 Variance- Based Methods 

15-1-1 Sobol Decomposition for Uniform Densities 


Consider the scalar* valued, nonlinear model 

y = f(Q) 


(15.5) 


where Q — [Qj, ■ . . n Q p ] - r c W\ We initially assume that the random variables 
are independent and uniformly distributed cm [I 1" so that 

Qi - «( 0, 1) ■ P = [0, If. 

The case of general densities is discussed in Section 15.1.2. Wo consider the second- 
order HDMR or Sobol expansion 


p 


/(<?) = fo + ^3 fi (<3i) + 51 A? (Qi ■ Qj) 

*=1 

discussed in Section 13.5 subject to the condition 


(15.7) 


o 


L('h)d*ti = / hj(QuHi)^i= / - 0 - 


(15.8) 


0 


i> 


which ensures that the fuucth ms arc ort liogonal in the Sense that 


I fi(qi)fj{qj)dqidqj ■ ■ / - 0 

r Jr 


(15.9) 


for i, j = 1, . . . , p. As detailed in [200, 227], the zeroth-, first-, and second- order 
terms can then be expressed as 


h = j J(o)dq, 


him) = j f(q)dq~i - /o. 

.! yr - i 


( 15 . 10 ) 


A i (qu qj) = / f(q)d<i~{ij} - fi (qu ) - fj %)- fa, 

J VP~ 2 




324 


Chapter 15. Global Sensitivity Analysis 


where T p 1 ; = [0,l] p 1 and r p 5 = [0 n l] p 2 . Recall Hint the notation q^i denotes 
the vector having all the components of q except those in the set t; for example. 


q^i = ■■■*%>]■ ( 10 . 11 ) 

The expressions (15,10) arc precisely those in (13.54) for ANOVA-HDMR, 

The total variance D of the response Y is given by 


/; = var(y) = j f 2 {q)dq - 


(15.12) 


since ftj = E(V). as detailed in Remark 15.3. By employing the expansion (15.7) 
and enforcing the orthonormality conditions (15.8) and (15.9), the total variance 
can be expressed as 

p 

D = Di + Dij , 

■i=i i <±<j< j> 

where the partial variances are 


A = 
D^ - 


f?(q,}dq,, 


■l ,-i 




fiAVi, qj)dqidqj. 


(15.13) 


die Sobol indices are defined to be 


^ = — , Sii = — h J = l s ■ ■ ■ t P-. 


D 


D 


so, by definition, they satisfy 


F 

E> + E % = 1 - 

t'.=l 1 <i<j<P 

The terms Si are often termed the importance measures or first-order sensitivity in- 
dices, and large value* of Si indicate parameter* that strongly influence the response 
variance. Similarly* S{j account for the influence of interaction terms. 

Because the number of first- and second- order Sobol indices is p I their 

analysis quickly becomes untenable for large parameter dimensions. This motivates 
the consideration of total sensitivity indices 

p 

S Ti = S, I E % (15.14) 

j= 1 

which quantify tlie total elleet of the parameter Qj on the response Y r 
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Remark 15,3. The expansion terms, pan ial unhurt's, and Sabo] indigos all have 
expectation or variance interpretations. To set t notation + we let 


* 01 ®) 



f(q)dq„ Al 





fUl)dq^{ l3 ) 


denote the expected responses when the components and q $ , qj are fixed. From 
(15. 10) „ it follows that 


fa = E(n 

fii'n) - Efrift) - h , 

hjim, <h) = E(y - fi(qi ) - fj(qj) - fa- 


Since 


it follows that. 


and hence 


e;eoi®)] 


i - 


i> Ljpp- 1 


/ (q)dq,-.i 


dqi = h. 


Di = wr[E(r| ft )] 

...ir[E(F|^;-)] 

1 vm-(r) ‘ 


Similarly, one can show that 


(15.15) 


(15.10) 


(15.17) 

(15.18) 


Dij ■ vwr[K(V|«,^)] - var[E(V|q;)] - vav[E(V \ qj ) , 

which yields a variance interpretation for S^. Finally, the total sensitivity index 
has the interpretation 

_ , var[E(V|<M?l _ E[var(F|^)l , |r 

T ' vai(r) vur(F) ‘ { } 


The interpretation of E(V q-i) and var E 0 q>) is further illustrated in Fig- 
ure 1 5.2 using scatterplots for the linear portfolio model (L5J). The conditional 
expectations for fixed q\ and qo are the average values of Y along vertical slices. 
The partial variances D{ quantify the variability of these average values. For this 
example, £?2 is significantly larger than D i , thus quantifying the structure observed 
in the figures. The Sobol indices quantify this relative influence on the scale [0, 1 1. 

Remark 15.4. I ‘Yam (1.5.19), we note that if .Sy, h. then iK | \ ar ( Y \ q . . . , ) | 0. 

which, by the nonnegativity of the variance operator : implies that var(Y'|g^ J i) ss 0 
for any admissible value of q^i. The condition St, ~ U thus implies that Q ; . Is 
noninfluential and can be fixed in subsequent model calibration and uncertainty 
quantification. The use of the total sensitivity index to reduce model complexity in 
this manner Is an important aspect of global sensitivity analysis. 
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Figure 15.2, Resjx.mse values for fixed (a) q i and (h) q^ used to consh'uct the. 
exportations IK ( V’ q, } and variances vfi.r|X ( ) \ q ,j | . 


15,1,2 Sobol Decomposition For General Densities 

Here we consider the nonlinear model (15.C) where Q \, . . , , Q p are considered to be 
lid random variables with ranges 1 r,- and densities Pq l [ ip- j . The range and joint 
density for Q are then 

p p 

r = n , p Q {q) - JJ p Qk {qk). 

fc=l lt=l 


Independence is required to expand (q) as a product of marginal densities. The 
assumption of identical distributions is made solely to simplify notation, and these 
relations can be easily extended to accommodate differing distributions. 

The complete Sobol decomposition of / is then 

/(<?)= Y1 M®'). (15.20) 

i'C{l 

where = is a set of integers with cardinality s, q\* — , . . . f qi J. and 

/# = /o- Each of the functions jV ? except fa, is assumed to satisfy 

I = 0 (15.21) 

Jr k 

for any q^ and all S' C { 1 .. 2, . . . , } that include k; this is the generalization of 
(15.8). This ensures that the components are orthogonal in the sense that 



/r 0/iO/ r (<?*' .1 Pq ( q ) dq 


for all i' / H\ 


for whkh (15.9) is a special case. With assumption (15-21 ) 1 the Sobol or IIDMR 
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decomposition is unique and the component functions arc given by 


him - ) 



/('/)A*(«~ i' )<%-!' 


f'Ci' 

t/i' 


where q^i is defined in (15.11). 

In h manner analogous to (15.12) and ( 15.1 3 j . the total and conditional or 
partial variances are defined by 

D = j / ! (?)P<j(fl) d ? ” fo 


and 

>h i = / fv{qi')pQ{<iv)dqv 
./[-■> 

-vwr[E(r|fr)]- £ 

Due to the orthogonality of tl le functions, the total variance can be expressed as 


D- e D * 
f£{t2 r -,ri 


I he Sobol indices are defined to be 


so that 


The total sensitivity indices 


5V = 


D : 


£? 


E = i. 


j.} 

L/tf 


Sr„ s £ S* 

k Bi P 


quantify the sensitivity of the variances of Y with respect to <3^ along with its 
interaction with all other inputs. To illustrate for p — 3. we note that 


= 'S' {2} + 5(1,2} + 5(2,3} + 5( li2| 3j. 

Remark 15.5. The expansion (15.20) is complete in the sense that it includes 
all interaction terms through order p . For the following examples , we truncate 
at second-order terms since, as discussed in Section 13.5, I1DMR techniques are 
efficient only if high-order interactions are negligible. 




32B 


Chapter 15. Global Sensitivity Analysis 


Remark 15.6. Vari aiicobascd indices are advantageous over regression and corre- 
lation- based indices since they do not require linearity or monotonicity For this 
reason., they are sometimes referred to as model- free methods. However, the as- 
sumption that parameters are mutually independent is typically required to ensure 
that Pq(q) can be expressed as a product of marginal densities. As detailed in Sec- 
tion 5.2, parameters in physical problems are typically correlated, which violates 
this assumption. If sufficient statistical information is available, one can employ 
Nataf or Rosenblatt transformations to reformulate t.he problem in terms of in- 
dependent parameters. However, obtaining the required marginal and correlation 
information can be difficult for complex problems. 


Example 15.7. Wie revisit t lie at Id it 1 ve p or t fo ! io model o f Exam p le 15.1 where 

1 1 

- Pq»($2} = 

era V <2 tt 


Pq^'/Ii) 






(T[ v2tt 

find jj Q (q) - prjJ/Ji jpQziqz)- Here 

/ 0 = 0 , 


/l(</l ) - / - Ciffi, 

'■'I 

/ate) — c 2 l li, 

and f^j) — 0. Tin* partial variances are 


A = / P<j,{(li) ( ki = cf'jf, 

JWt 


Dt 

D f j — 0, 


and t.he total variance is 


D = cjoj \ 

as noted in (15.2), The Sobol indices are 


Si = 


rT 7 ''' 

t % 


2 1 I 2 2 

c tel 1-C 2 CTJ 


, Sij = o 


so that Si ■ ^ and $2 : ' since C\ r 2 a 02 : l n sq 1, and rro ■ : 3- For this 

additive example, we thus see that Si = (Sf) 2 , where 5^ is defined in (15.4). 

Example 15.8. We now consider the mode] 

y = Q3Q] I Q'\Qzi 
where Q — [Qj , Q^. Q 4 ] is a random vector and 

Qi ~ A : (0, of) , Q -2 ~ A : (0, of), 

Q'i A : (cj, (t|) , t? 4 ^ N{c 2 , of). 
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This is a generalization of the portfolio model (15.1), where the amounts are nor* 
m ally distributed about Cj 2 and e-2 Lin this ease 


and 


so that 


Jo = 0 : 

/] (-51 ) = / 'q-.tqi I >Uq 2 ]dq 2 dq->dq 4 = 0, 

J R a 

hi'iz) - h(<2z) — 

/i3(tfi,^) = / Pe,(f/a)Pe*(</4)[4srn + WnV ir ndqi - </s<; 1, 

/2-1 (92 • 5-1 ) — ?-! 72 ■ /] 2 (9l s 92) — /2;i(^2 . ?3 ) — 0 


A = 0, 

A?: = / P^ t (< 7 i )Pq 3 ^i^;t = ^<t . 2 , £>34 = 

J132 


5 ( = 0. 


= 




a'\ <j\ 4- it 


- sT-Z 
0 u •] 


5, :J = 


2 2 




2-2 


4- iTSff 


Example 15*11* Consider the fchigami function 

/(rjf) = sin <jrjL + asm 2 q 2 + && sin q t (15.22) 


discussed in Example 13.8. It is established in Exercise 15.2 that the total and 
partial variances are 


a: 




b 7T- 1 


b 2 TT* 


bn 4 

Ul = — + 


o 


— 


H - — ■ 

5 

18 

2 

b 2 x* 

1 



- - 

. D% : 

50 

2 


g.* — 

i) 2 n A 

b 2 7t' 

1:1 — 

18 

50 


a* 

~s 


0. 


(15.23) 


, D23 — &i2:i — 0 . 


r l‘lie Sobol indices Si and S-ij and total So ho I indices 1S7-. provide comprehensive 
measures for quantifying the influence of parameter uncertainty on the variance 
of the response. However, their computation can be prohibitively expensive for 
large parameter dimensions since they require the approximation of integrals up to 
dimension p, This has motivated significant research in two directions: techniques to 
efficiently compute the partial variances and Sobol indices in high dimensions, and 
screening algorithms that approximate sensitivity information using a linearization 
of the model. The algorithm summarized in Section 15.1.3 is widely employed to 
construct first-order sensitivity indices. Additionally, evaluation techniques based 
on the stochastic polynomial or cut- H DAI R expansions discussed in Sections 13.5.2 
and 13.5.4 constitute two recently employed techniques. We discuss Morris screening 
algorithms in Section 15-2- 
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15.1.3 Algorithm to Compote Sensitivity Indices 

The computation of D * and S{ given by (15.17) and (15.18) requires the approx- 
imation of var[E(V|^)J, 11 one uses .1 / Monte Carlo evaluations to approximate 
the conditional mean E{F|(fr) for fixed qi and repeats the procedure M times to 
approximate the variance, a total of M 2 evaluations will be required to evaluate a 
single sensitivity index. For large parameter dimensions p t this brute- force approach 
is clearly prohibitive. The following algorithm of S&lteUi [214], which is based oil 
Sobers original approach [227], reduces the number of required function evaluations 
to M {p + 2). 


Algorithm 15.10. 1. Create two M x p sample matrices 



1\ ■ 

• 4 • 

*ip 


' u\ - 

•• n} 

i 

1 = 



■ 

, B = 

■ 


■ 


^ ■ 

- qf' ! - ■ ■ 

^ - 


_ <7i l/ ■ 

-- # 

1 


where qj and q- are quasi-ranclom numbers drawn from the respective densities, 
2, Create M x p matrices 


Si 









which are identical to B with the exception that the t fh column is taken from A., 


3. Compute M \ 1 vectors of model outputs 


VA = f{A) , ya = f(B) , yc t = fid) 


by evaluating the model at the input values in A, B, and C . . The evaluation of y_A 
and yn requires 2 M model evaluations, whereas the evaluation of yc \ , i = 1, . . . ,p : 
requires pM evaluations. Hence the total number of model evaluations is M{p T 2). 

4. The estimates for the first- order sensitivity indices arc 


vai-[E(>%)) 

var(Y) 


■nVAVCi - U 
i T. r 

TTVaVa - /o 


£ E.” , Me, - ft 


hr.Uv’A) 2 


M 


-n 


(15.24) 


where the mean is approximated by 



(15-25) 


The estimates for the total effects indices are 
, _ var[E(Y|^j)] = ^ _ jjy%llc, - /o 


var(Y) 


1 ..TV, 


VaVa ~ /- 




o 


T7£f;i *LttL-,-J§ 

^ E;?- 1 (y J h 2 - /i? 


(1 5.20) 
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_ llmmi mi -s - P 

TIk 1 intuition for the algorithm is the following. In the scalar product y^yot? 
the response computed from values in ,4 is multiplied by values for which all param- 
eters except qi have been resampled. If qi is influential, then large (or small) values 
of y m ,\ will be correspondingly multiplied by large for small) values of yc, yielding 
a largo value of 5^. If Qi is not influent lab ls-rge and small values of yA and yc* will 
occur mure randomly and Si will be small. 

Details regarding the derivation of the algorithm and modifications to improve 
it s acci iracy are p rov i dec 1 in 155. 214, 215], 


15.2 Morris Screening 

Screening methods provide an alternative io variance-based methods for identify- 
ing critical inputs to high-dimensional input spaces or models whose computational 
expense prohibits construction of Sobol indices. Screening methods generally pro- 
vide- the capability to rank parameters according in their importance but, unlike 
variance-based methods, they typically do not quantity how much more important 
one parameter is than another. 

As detailed in the review paper [216b screening methods are in the class of One 
factor At a Time. (OA T) methods in which one measures the variation in outputs as 
inputs are varied individually. The Morris algorithm [17-3] partially eliminates the 
local nature of OAT methods, which is one of their main limitations, by averaging 
over local derivative approximations to provide more global sensitivity measures. 
The goal of Morris screening is to identify those inputs or parameters — collectively 
termed factors in the literature — that are (i) negligible, (ii) linear and additive, or 
(iii) nonlinear or comprised of interactions between inputs. 

We again consider the model 


y — f i<i) . 7 = [?i, ■ • . , n P . - 


We assume that each input term has been scaled to the interval [0, 1], but . as noted 
in Remark 15,12, other scalings can bo employed to facilitate compulations. 

The concept of Morris screening is very simple; one averages coarse local 
sensitivity approximations, ■ >ften termed elementary effect s > over the input space to 
provide a measure of global sensitivity. Hence it is based on a linearization of the 


model- To construct the elementary effect, one partitions [0, 1] into 7-levels, which, 
as illustrated in Figure 15.3. restricts each input to £ values, 'the elementary effect 


associated wi lli Ihe i th in pul is then defined by the difference quotient 


f kUi) 


f(<]i , . . . , qi. i ,fn i A. m . i ft,) - / (7) 

A 


f(q I Ac/) - f(q) 


(15.27) 


where the stepsize A is chosen from the set 


A € 


i - 1 


I - 



(15.28) 


As illustrated in Figure 15.3, possible stepsizes for £ ■ 4 are A G If one 

denotes the set of gridpoints by T*. the definition (15.27) holds for any input vector 
q such that f/+ Aci € Ff , where e* is a vector of zeros with one in the i th component. 
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Figure 15. 3, Four- level grid (£ = ij for r ; - = [gi t g 2 ] with A = 


Remark 15.11. Due to the magnitude of A, the elementary effect is a very coarse 
approximation of the local sensitivity. It can be used to rank the relative importance 
of inputs but not to resolve fine- scale gradient or local sensitivity behavior unless 
one employs significantly smaller stepsizes A. 

Remark 15.12. Rather than scaling parameters to the unit hypercube T [0, Ip, 

some authors scale so that 0 < q,. < £ — 1 [ 56 ]. This yields A F {'I £ — 1 } so 

that parameters are evalualed at integer values. 


Remark 15.13. It is illustrated in Example 15.15 that the unsealed elementary 
effect dj(rf) given by (15.27) yields an incorrect classification of parameters for the 
linear model (J5.J) since it lacks a mechanism to incorporate the variability of 
parameters. This motivates the use of the scaled elementary effect 



/(»/ + ;} - IW) fi_ 

A (T y ' 


(15.29) 


where cr* and ay are the standard deviations of t-lie parameter Q± and response 
Y 2 23 1. This iw a n al t jgoi is to i ] m si gma- 1 k :n in al ized sensi ti vity rd at ion (15.4), wlii cl i 
is hybrid local-global in nature* 


The elementary effects quantify the approximate, large scale* local sen- 
sitivity behavior at t lie point q. To provide a psendoglobal sensitivity measure, 
one approximates the mean and variance of the finite- dimensional distribution G* 
associated with each |A (</} | that is constructed by randomly sampling q from ad- 
missible points in Tf. As detailed in [215], the choice of the distribution associated 
with di(q)\ rather than di(q ) avoids Type II errors, which can occur when the 
distribution has both positive and negative elements, 

For r sample points, the sensitivity measures for are taken to be the sampling 
mean and variance 

t4 = ^ Y 
.?= 1 

*i = yzi A (tffo) - w) 


fH 




(15.30) 


j= i 
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Is the elementary effect associated with the i t,L parameter and j th sample. The mean 
quantifies the individual effect of the input on the output, whereas the variance 
estimates the combined effects of the input due to nonlinearities or interactions 
with other inputs. The latter interpretation is motivated by the observation that 
large variances indicate a strong dependence on neighboring input values. 

it is illustrated in Examples 15.17 and 15.18 that the ordering of pf and cq 
can be used to rank parameters according to their relative importance in a manner 
similar to that provided by the total sensitivity indices . This can be used to de- 
ten nine noninfluential parameters that can be fixed in subsequent model calibration, 
sensitivity analysis, and uncertainty < plant ideation to reduce model complexity. 

While fi~l and oq can be used to screen the effects of individual parameters, they 
do not quantify the magnitude of parameter interactions. Second-order interactions 
can be screened by approximating cross derivatives in the manner detailed in [59, 

«->]• 

Implementation issues include the ehoioe of f n choice of A, and strategies to 
optimally sample r elementary effects from C, r. The choices of l and r are linked in 
the sense Uiat larger values of both yield improved accuracy. As detailed in [175], 
taking £ to be even and choosing A = | . has the advantage that it; guarantees, 

equal probability sampling from the distributions Gi. This motivated our choices 
of f = 4 and A = | in Figure 15.3 and Example 15.14. We next discuss a strategy 
for efficiently sampling elementary effects from G\, 

Morris Sampling Strategy 

Since the computation of each elementary effect requires two model evalua- 
tions, naive sampling would require 2 pr model evaluations to construct pi and cq 
using r sample points. In the Morris sampling algorithm n one employs neighbors — in 
the manner illustrated in Figure 15,4 — to reduce the number of model evaluations 



Figure 15,4. fa) Random initial vector q" : and model evaluations required to 
construct B" for Example 15.14. (b) Example trajectory when p = 3. 
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required to const ruct p elementary effects from 2 p to p + I . In this manner and 
a i can he constructed with (/? 4- 1 )r samples. 

To construct trajectories in which neighbors differ in only one component* as 
required for the difference formulas ( 15.27) and (15,31), one employs a (j> hi) x p 
sampling matrix D * comprised of p + 1 model realizations with the elements in the 
i. row representing the parameter values used in the i th evaluation. This matrix 
is constructed so that for every column, j ■ = 1 , . . . ,p, there are two rows that differ 
only in their j th component. By subtracting the elements in consecutive rows, one 
can evaluate the p elementary effects associated with the random initial point q* . 

A deterministic D is given by 

B* - Jp+i iV q* + AS, 

where B is taken to be a (p + 1) xp strictly lower triangular matrix of ones, 

is a (p 4- [) x ;> matrix of ones, and A is the stepsize from i 1.5.28). The difficulty is 

that elementary effects constructed in this manner are not randomly selected. 

To obtain random samples from Gi, one employs the orientation matrix 

B* = A P+ i,i q m + ^ [(2 B - J p | lsP ) D* + J p | a,,]) P\ 

where D ' is a p x p diagonal matrix whoso elements are randomly chosen from the 
set {—1,1}, and the p x. p matrix P >: is constructed by randomly permuting the 
columns of a p X p identity matrix. 


Example 15 + 14 + Take p = 2, ( = 1. and A = | > as depicted in Figure 15.4(a). 
The seed value q* = [^, and choices 


D* - 


1 

0 






' n 

o ' 


1 1 

0 

1 




, ./ = 


0 

t 13 - 

i 

0 

1 1 

[ J 








i 

1 


1 1 


yield 


B* 


I 1/3 
l 1 
1/3 i 


Hence one employs model evaluations at^ 1 = [1, h], q 2 = [1,1], and g 3 = [^, 1] to 
construct two elementary effects rfi and d? associated with (f - We note that (f 
seeds the algorithm but is not employed as a sample point. A similar trajectory for 
p ■ - 3 is plotted m Figure 15.4(b). 


The final issue concerns the generation of the seed parameter values q * . In 
Morris” algorithm, q' was generated randomly from values in the parameter space. 
However, this can lead to nonoptimal coverage of the parameter space for high- 
dimensional problems. This is addressed in GO by an algorithm that optimises the 
distance between initial points to ensure that they best cover the input space. The 
determination of optimal distance metrics remains an open research topic. 


Remark 15,15, The only point at which the parameter densities play a role is in 
the generation of random seed values q * . For nonuniform densities, samples can be 
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mapped to [0,1] using an appropriate transformation. IT joint densities are available, 
one can relax the requirement that parameters be mutualF. independent. 


Example 15,16- We revisit the linear portfolio model 

V r = ciQi +c 2 Q 2 

of Example 15.1 where ci = 2, eg = 1 and Q } — jV(U, 1), Q 2 ^ A r (Tl. 0), For any 
choice of f, r 7 and A, the elementary effects are di = dre* so that |dj| is the local 

-i 1/ 

sensitivity $i -^q- defined in (15.3). Hence the means of G$ are 2 and 

1 } which reflect the local behavior rather than the uncertainty associated with 
Qi and Q 2 - We note that j.i t can attain values between —2 and 2 since di — ±c$. 
This illustrates an advantage of considering the distribution Gi associated with | d\ I 
and a limitation of this sampling strategy using unsealed elementary effects d f _ for 
global sensitivity analvds, Alternatively, use of the sigma-normalized elementary 
effect d? given by (15.29) yields /if = and p 2 — which arc the same as the 

sigma-normalized sensitivity values obtained in (15.5). These values are consistent 

with the scatter plot information shown in Figure 15,1. 


Example 15*17* Consider the function 

p 

y * ’ ^( q <) 

T — 1 


I ±Qi - 2\ I ftf 

1 + fii 


(15.32) 


attributed to Sober |227], where > 0 are fixed n deterministic coefficients. Since 


J - 


l 


1 H- cti 


< 


; i + 


i 


1 I at 


the cocfFicicMLt.H u. determine the relative importance oi l la 1 random i :«n r ii meter as 
illu.-l l ri i i d ir F-yur:- I . “i . T 1 1 I m ; ■ : : ■ > 1 1 is widely emp loved as ,=! tost . ■ fur gin:.-- I 



Figure 15.5. C7ompon#nf, functions //A//?) I or a 0.5, a 2, and a 97. 
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Q i 

Q<1 

<?3 

Qi 

Qz 

Q 6 

(ti 

78 

12 

0.5 

2 

07 

33 

Di 

5.3 x icr 5 

2.0 X 10“ 3 

1.5 X IQ -1 

3.7 X IQ" 2 

3.5 x 10" 5 

2.0 x 10- 1 

Si 

2.8 x 10 ~ 4 

1 .0 x 1 0 - 

7.7 x 10 1 

Iff x JO -1 

1.8 x 10 4 

1 .5 x 1 0 :5 


\ 3.3 X 10“ 4 

1.2 x 10- 2 

8.0 x 10-* 

2.2 x 10“' 

2.1 X 10“ 4 

1.8 X 10“ :< 


Table 15.1. Coefficients^ first-order partial vatianc€S f r ind Sobol indices for p — 0. 


sensitivity analysis since it is strongly nonlinear, nonmonotonic, and has nonzero 
interaction terms by construction. Furthermore, the partial and total variances 
h a v e i lit i cxpli cit Ft ^ [ ■ r< 'Hen t at it. j i j s 

Di = var[E(Y|$i)] = 7, 

0(1 r o,i J - 

Djj — var[E(V |ft, <^)] — D* — Dj — D^Dj^ (15,33) 

p 

D = vai fY}= -1 + 11(1 + Di) 

■i= i 

for Qi ~ i/ (0, 1) , i = 1, , , „ ,p. This yields explicit formulas for the Sobol indices 
Si = Sjj — Ijf- and total sensitivity St, given by (15.14). 

The first-order partial variances and Sobol indices for six coefficients o^ are 
summarized in Table 15.1. We then took £ = = 4, A = and r - = 4 and employed 
the sampled trajectories compiled in Table 3.2 of [215] to obtain the sensitivity 
measures //■, and <r summarized in T^blc 15.2. Finally, sca-tterplots obtained 
with 500 joint realizations tfr, . . . , y are plotted in Figure 15.0. 

For this choice of coefficients a;, 5* = 0-07 and 5lt=i Sr, = 1-03, so 

second-order interactions are negligible. This is due to the fact that the large values 
of Oi yield the small reported partial variances D t which in turn produce negligible 
cross partial variances Dij = DiDj. The effect of significant interaction terms due 
to small values of a.j is illustrated in Exercise 15.4. 

The scatterplots in Figure 15.0 reveal that the parameter is most intiucu- 
ii:d since the realizations i >]:■;,//) clearly exhibit the underlying functional behavior 
of the model ( 15.32). The parameter Q& is the second most influent lab whereas t he 
remaining parameters have significantly less influence, as illustrated by the observa- 
tion that their scatterplots are nearly uniform. This trend is reflected in the partial 


Q 1 

Q'2 

Qi 

Qi 

Qs 

<36 

Hi -0.005 

-0,078 

-0.130 

-0.004 

0.012 

-0.004 

Ht 0.050 

0.277 

1.700 

1.185 

0.035 

0.090 

0-064 

0.321 

2.040 

1,370 

0.041 

0.122 


T*d>le 15.2. Estimated Morris & r.n&i tivity measures obtained with £ = 4, i 
and r = 4 from 215. 
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Figure 15.6. Sc atterplots of y vwsus ^1 through *je, constructed using 500 joint 
realizations. 


variances, Sobol indices, and Morris sensitivity measures, where it is observed that 


D's, jD 4i S \ , 5,] , ji\\, (i.i arid , tj.± are dominant over the remaining sensitivity indices, 
Because Q \ . Q 2 *Qr t *Q f, tire relatively insensitive, their values can be fixed for 


subsequent model calibration and sensitivity analysis. Tb illustrate, the scattcrplots 
obtained with */ ■ - - 7,. = ^ and ~ W(0, 1), Q 4 ^ W(0 T 1) are plotted in 

Figure 1-5.7. Comparison with Figure 15,6 illustrates that the effect of uncertainty 


in Q'j and Q.i on Y are nearly the same in the two cases. This illustrates the 
manner in which global sensitivity analysis can reduce the number of critical random 
parameters without significantly diminishing the model accuracy. 



Figure 15.7. ScQttsi'plotfi of y vwsuft 73 and r;^ with 71 = 72 = ^ 


1 


15.3 Time- or Space-Dependent Responses 

So far in this chapter, we have assumed that the nonlinear model Y = f{Q) is scalar- 
valued and a function only of the parameters. However, several of the models in 
Chapters 2 and 3 are also functions of time, space, or other independent variables. 
We summarize here issues associated with global sensitivity analysis for time- and 
sp ace-dependent resp onses 

Y(l)=f(l i u,Q) (15.34) 
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or 

Y(x)=j-(x i u,Qy (15.3r>) 

where u is the state and x c TR 1 n 3 or R 3 . 


T i inn- Depend nut Response 

Wo consider first, the time-dependent case. The most direct approach is to 
construct a set of sensitivity indices {- s Y ; (*j)} or t/*r (*j)}i i = ' - ,P- at time 

points ij of interest to quantify the influence of parameters throughout the time 
interval. Sensitivity measures constructed in this manner can indicate whether the 
relative influence of parameters changes as a function of time. Alternatively, the 
time-dependent response can be integrated to obtain a sc alar- valued response 


Y = 



fit, u. q)dt 


(15.36) 


H the objective is to identify most influential parameters for the entire time interval. 
As illustrated in Example 15,18, one might employ this approach if the goal is to 
reduce the model complexity by fixing non influential parameters. 

For certain applications, one may seek to extract salient features of the time- 
dependent output and quantify how they depend on parameter uncertainties. One 
approach is to combine functional principal component analysis (fPCA) with the 
variance- based or screening techniques of Sections 15.1 or 15,2 to identify those 
parameters and their interactions that most strongly influence a dynamical model. 
We summarize the methodology of [242 and refer readers to that reference for 
details illustrating the approach for an insulin signaling model. 

The global sensitivity analysis approach based on fPCA is depicted in Fig- 
ure 15.8. One first generates M parameter values {fj:'" } ^ , from respective densities 
pq £ (<^J using the sampling techniques detailed in Sections 15.1 or 15,2, For exam- 
ple f this would entail quasi- random number generation if using Algorithm 15 JO 
or the generation of seed values q* if using Morris screening. Based on the as- 
sumption of mutually independent parameters, this yields M parameter- vectors 


■m 




q r — [^j'% . , For each parameter vector, one evaluates the model io obtain 

M time-dependent responses {i/ ri (£)}. Functional principal component analysis 
(fPCA) is then used to construct a low number M pc M of functions ti : (4) and 
coefficients y m t so that each response can bo represented as 


A-f 


JSC 


y m (*) = V y™* 

A-=l 


. m — M 


Because the coefficient quantifies the degree to which the component £*(/) 
contributes to the response determination of how parameters influence each 

coefficient provides a measure of how they will influence the model behavior 
described by £*(£). The techniques of Sections 15.1 and 15,2 are then used to 
determine those parameters that have the most influence on the fPCA coefficients 
so that non influential parameters can be fixed to reduce model complexity. 
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Figure 15-8. (a) Stunplv fro-m rmiTf/ina-l densities ij.fo) lo construct M pn-mmet^r 
vectors q in = [^,..., 1 /^] and (b) responses y JTf "(t) = f{t,u 7 q Tri ) t m = 1 , M , 
(c) Functional principal components £k(t) and (d) sensitivity of coefficients y m k to 
m ode l pummel ers . 


Details comparing the performance of Sobol indices and Morris screening in 
this framework are illustrated for a complex insulin signaling model in 242 . Those 
results demonstrate that for this application t the Morris screening approach pro- 
vides qualitative sensitivity measures that are consistent with the Sobol indices 
but at substantially less, computational cost approximatly lb minutes for Morris 
screening compared with approximately 1.06 days for computing Sobol indices. 


Spatially Dependent Response 


Techniques for global sensitivity analysis for spatially varying responses (15,35). 
or responses that are functions of other independent variables, are similar to those 
for time- varying responses: lienee we simply provide references. Techniques for gen- 
eral functional outputs are discussed in [58], whereas the analytic and numerical 
computation of spatially varying Sobol indices is detailed in [166]. The paper [155] 
provides an overview of techniques for global sensitivity analysis with spatially vary- 
ing models along with a review of applicat ions. 


Example 15+18+ To illustrate the global sensitivity analysis of a time- dependent 

model, we consider the SI [?. disease model 


dS 

Hi 


- 5N 


- SS - 7 kis 


dl 

— = 7 kIS-(r + S 


dR 

lit 


- tI - SR 


5(0) - 5n, 


/( 0) = / 0l 


(15.37) 


5(0) = R 0 , 
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where S{t), /(£), and R(l) arc the number of susceptible, infectious, and recovered 
individuals in a population of size TV, As rioted in Example 3.4 + the parameters are 

<i = [7i < 5 ], 


where 7, k, and r respectively denote the infection coefficient ? the interaction co- 
efficient which quantifies the probability that an individual comes in contact with 
others, and the recovery rate. Whereas 7 and k both influence the disease dynam- 
ics, they differ in the sense that 7 is a property of the disease, while k reflects the 
degree of personal contact. Hence 7 is difficult to control, while k can be controlled 
via policies such as isolation or quarantine. Finally, the birth and death rates are 
assumed to be equal with both denoted by i>. 

We take the parameter distributions to he 

7 — 7/(0, 1) , k ^ Betafo, t 3) , r ^ U( 0, 1) , tf ^ 7/(G t 1), (15.38) 


where the beta distribution is defined in Definition 4,It>, The choice of uniform 


distributions for 7, r, and 5 is made to reflect limited prior knowledge about the 
parameters. As illustrated in Figure 15.9, the beta distribution can be tuned, 
through the choice of a and d, to quantify flic degree to which individuals interact 
with others in the population. We consider two cases, Beta(2, 7) and Beta (0,2, 15), 
which reflect large and limited degrees of interaction. The init ial values are taken 
to be jSVj «as 900, Ro = 0, and /n = 1U0 so that A r = 1000, The scalar response is 
taken to be 


y = [ R{t , q)dt, 
Jo 


where R{i, q) is computed by numerically integrating 


the couf > led sys tei 11 (15,37), 


Case 1: Large Degree of Interactions JJeta{2,7) 

The choice a = 2, 0 = 7 models 1 lie ease when individuals have a large 
probat rility of interacting with up to half the population and a low probability of 
meeting with everyone. 



Figure 15*9* Be ta dis in but tioi is Bcta{ 2.7) and Beta( 0.2, 15). 
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7 

k 

r 

t) 

Si 

0.0997 

0.0312 

0.7901 

0.1750 

Sr, 

-0.0637 

-0.0541 

0.5634 

0.2029 

Vi (xlO 3 ) 

0.2532 

0.2812 

2.0184 

1.2328 

<7i (Xlfl' ! ) 

0.9539 

1 .6245 

6,6748 

3,9880 


Table 15*3* Sobol indices and Morris sensitivity measures for k ™ Beta{ 2, 7) . 


The first-order Sobol indices Si and , computed using the relations (15-24) 
and (15,25) with M = 1000, are summarized in Table 15.3 along with the Morris 
measures /i* and given by (15.30) with r = 20 and £ = 40. The total indices 
and absolute means both indicate that for this case, the recovery rate has primary 
influence, the birth-death rate 6 has secondary influence, and the infection and 
contact coefficients 7 and k have negligible influence. The negative values for Stj 
and St 2 reflect the approximate nature of the expressions (15.24) and (15.25) for 
small or moderate sample sizers M . 

A representative realization of trajectories is plotted in Figure 15.10(a). Be- 
cause of the large rate of interactions, the number of susceptible individuals rapidly 
diminishes with commensurate growth in the number of infected. The recovery is 
slower due to the magnitude of the rate r. The influence of the birth rate 6 is easily 
observed in S' (7.) after 0.5 seconds. 

To illustrate the use of the model For uncertainty quantification, the density 
for lilt i) at if — 5 s was constructed from the M = 1000 realizations of the model 
h. 1 1 i 1 : mid is plotted in Figure 15.1 1(h). Tlie micertmul \ in pmiimelcrs induces 
significant uncertainty in this response. 

The parameters 7 and k are relatively noninfluential in this case because the 
number of contacts ensures rapid infection. Hence we can fix these parameters 



Figure 15,10. Dynamics of S(t), lit), and R(t) for a single realization of param- 
et ers from (15.38) with (a) k ^ Be ta ( 2 , 7) and (b ) k ^ Deta( 0.2.15). 
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(«) 0-0 


Figure 15+11+ Densities for R { t r ) at tf = 5 with (a) k ™ Bef a(2, 7} emd (b) k ^ 
Be iofO.S, 15); all parameters random, nomnfiu-ential- parameters fixed. 


without significantly changing t hi- distribution of responses. To illustrate, the den- 
sity obtained with random r and tf and 7 = 0,5, k = 0.2 fixed is also plotted in 
Figure 15, 11 (a). It is observed that there is little difference between this density 
and that obtained with random 7. k, r. tf . 


Case 2: Limited Interactions Beta(U+2 r 15) 


The mechanism in the model that wc can directly control is the interaction 
coefficient k . As illustrated in Figure 15.9, the choice k — Beta (0,2, 15) has a high 
probability of limited interactions* so this enforces isolation of infected individuals. 

The ■sensitivity indices and Morris sensitivity measures in Table 15.4 indicate 
that k is now the most influential parameter and the birth-death rate tf is the least 
influential. The trajectory in Figure 15.111(b) illustrates a typical realization in 
which the disease does not spread and the number of infected diminishes due to 
the death rate tf. This illustrates that if the infection coefficient is not too high* 
isolation of infected individuals can effectively deter the disease spread. 

The densities for R(tj) at / / — 5, plotted in Figure 15.11(b). illustrate that 
the percentage of the population with a high recovery rate is small since few were 



7 

k 

r 

tf 

Si 

0.071 1 

0.6233 

0.1450 

0.091 (1 

S'Tt 

0.1428 

O.7U20 

0.1801 

9.0248 

(x 10 3 ) 

3.4296 

4.6425 

2.9320 

1.7947 

<Tj (x LO^) 

6,0782 

9.3842 

3.5811 

2,0125 


Tabic 15+4+ Sobol indices and Mon is sensitivity measures for ■ k ^ Beta( 0.2, 15) 
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infected, Furthermore similar results are obtained when S is fixed at- ^ as motivated 
by the results in Table 15.4. It 5s established in Exercise 15.5 that one can obtain 
a similar density with both 7 and S fixed. 


15.4 Notes and References 

An important objective of global sensitivity analysis is to ascertain nonin fluent! a! 
parameters which can be fixed to reduce the complexity of models for subsequent 
model calibration and uncertainty propagation. Hence it can be used to guide which 
parameters are included in the IIDMR surrogate models discussed in Section 13-5 
or for parameter selection as detailed in Section 6.2, 

However, as illustrated in Example 15.18, the relative influence of parameters 
can he a function of their die triim ti (.at , . which is often unknown and to he deter 
mined using Bayesian model calibration techniques . A conservative approach might 
be to treat parameters as uni form during this initial sensitivity analysis to reflect 
poorly known prior knowledge. However, this can yield poor results if parameters 
are incorrectly fixed in surrogate or full-order models used to estimate parameter 
densities. This can be partially addressed by employing alternative algorithms to 
construct Morris seed values q* or using local or global sensitivity techniques which 
do not utilize parameter distributions. However, theoretical algorithmic solutions 
to this problem in the context of global sensitivity analysis are generally lacking 
and constitute an area of current research. 

The variance-based and Morris screening techniques each have advantages and 
disadvantages. The Sobol indices Sij y and St x provide a comprehensive measure 
that quantifies the influence of parameter uncertainties on the response variance, 
but their computational cost can be extensive or prohibitive. Morris screening 
techniques can be used to rank Hie relative influence of parameters at a fraction 
of their computational cost, but. as illustrated in Example 15.16^ they can yield 
incorrect results even for linear problems if global effects are not correctly incorpo- 
rated. Furthermore, they provide qualitative rather than quantitative measures of 
each parameter's influence. Both methods generally rely on the assumption that 
parameters are mutually independent, although this assumption is easily relaxed for 
the screening methods by choosing appropriate methods to generate seed values q " . 

The gradient- based methods of |l, 21 . f>6| fall within the same general frame- 
work as Morris screening in that local, linear sensitivities are evaluated at random 
input values to provide pscudoglobal sensitivity measures. Details regarding these 
approaches arc provided in Section 6.2.2. 

Readers are referred to [215, 216, 217] for an overview of issues pertaining 
to global sensitivity analysis. This includes regression- based sensitivity analysis 
methods, which we did not, cover in this chapter. The use of stochastic polyno- 
mial methods to construct surrogate models that permit the analytic computation 
of Sobol indices based on the expansion coefficient is detailed in |G8. 241], This 
includes a comparison of the methods for the Ishigami function (15.22), the Sobol 
function (15.32), and a finite element model for soil mass. The computation of 
Sobol sensitivity indices based on the cut- IIDMR and RS- IIDMR expansions of 
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Section 13.5 is detailed in [153, 177] and illustrated in the context of a finite ele- 
ment. model for a masonry wall Kin ally, readers are referred to [65, 1 6x5] for details 
illustrating global sensitivity analysis and uncertainty quantification for biological 
models. 


15.5 Exercises 

Exercise 15*1* Use the relation (15-16) to show that the first-order partial vari- 
ances Oj have the probabilistic interpretation (15-17). 

Exercise 15<2* For the Ishigami function 

f{q) = shiffi ■ ft sin "^2 I fr^singi 


discussed in Examples 13.8 and 15.5b establish the total and partial variance rela- 
tions (15.23). For a = b = 0,1, apply Morris screening with various levels i and 
atepsizes A and compare the parameter rankings with the analytic relations. 

Exercise 15*3* Consider the Sobol function 

Y — TT ~ 2 \ + u i 

M 1 1 ai 

with Qi U( 0, 1 1 . Establish the variance relations ( 16,33), 

Exercise 15,4, Consider the function 

y _ rr \^Qj — 2| -I- a i 
U a* ’ 

i — i 

discussed in Example 15.17, with p = 6 and a = [0. 2. 0.3, 0.2 ,0.1 ,0.4,0.05 . Com- 
pute the Sobol indices S\ and Sn ? and show that the second-order effects are sig- 
1 1 i he :a r it fo r ti ] i s pa ra t |i etei ■ set . 

Exercise 15,5, For the SIR model (15.37) in Example 13,13, compute the Sobol 
indices S \ , SV, at id Morris measures r 7 , for k Reta(2 t 7) and A . 1 ^ 1 3 eta (0.2, 15) 
and compare both the results and the required computational times. For k — 
Dcta( 0 , 2 , 15), compare the density for R(tf) at tf =5 ol itafued with random 7 , K\ r, $ 
with that obtained with 7 and 6 fixed at 0 5, 




App endix A 

Concepts from Functional 
Analysis 


We summarise here concepts from function a] analysis that are employed in the text. 
The discussion is necessarily selective , and readers are referred to cited references 
for additional theory and details. 

V 


Function ids. Dim I Spaces^ and Hilbert Spares 

Throughout this discussion, we take X to be a vector space, V is a no lined 
space, and T : dom(T) C X — * Y is a possibly nonlinear operator. For the case 
Y : = the operator is a real- valued functional which we denote by J. 


Definition A > 1 (Dual Space A"), Lot A" be a normod space. The set £(A\1R) of 
all bounded linear fund ion ah on X also constitutes a n mined space with the norm 


defined by 

a . 1 



i-%)i 



sup |./(j;)|. 
||ar||=l 


(A. 1) 


This is termed the dual space X * of X . 


Example A, 2, Let A' C|&_t| denote the space of continuous functions on the 
interval / = (a, 6 , and define J : A r — > E by 


J(x) = j x(t)dt, 

J a 

The linearity of J follows directly from the linearity of integration. Furthermore, 


!■%)! = 



J (i 


< \h— a)inax|#(t)| — \ h— a) \\x I 

t£l 


so that || J|] < b — a. If we consider x — xo - ■ L it follows (.list. 


jii > 


Xa 


= |J(® 0 )| m l dt = b-a 
J (( 


so that 


= b - a. 





346 


Appendix A. Concepts from Functional Analysis 


The norm on a vector space generalizes the concept of the length for a vec- 
tor. Similarly, we will typically impose inner products which provide measures of 
orthogonality as an extension of the Euclidean dot product. Specifically, we will 
generally consider operations defined on Hilbert spaces. 

Definition Adi (Hilbert Space). An inner product space is a vector space X 
with an associated inner product {>■), A complete inner product space is termed a 
Hilbert, space. We note that the inner product defines a norm 

INI = v(v) 

and metric 

dfo y) = II* - y|| 

OH X, 


Example A, 4. Consider an inner product space X, and define the functional 


£(s) = (u, x) , (A. 2) 

where it £ X is arbitrary and fixed. The linearity of f is obvious, and it follows 
from the Cauchy-Schwars inequality that 

K(ar)| < IMIIMI 


so that || I < | u \\ _ furthermore, (u. u) u|| 2 so that ||f > u 1 1 

and hence £|| = || tt \\ . 

Given a Hilbert space, we can always construct a bounded linear functional 
in this manner. One of the “big” theorems of functional analysis is the Riesz 
representation theorem, which establishes that in a Hilbert space t the converse is 
also true. The reader is referred to |13-D| for proofs. 

Theorem A. 5 (Riesz Representation Theorem)* To every bounded linear 
functional £ on a Hilbert space H „ there corresponds a uniquely defined element 
v such thal 

f (<t) = {■«- V ) 

for all u e If- Furthermore, the norm of £ satisfies 


Remark A.G. The Riesz representation theorem establishes that Hilbert spaces 
are self-dual in the sense that for each f £ IS . there exists h unique v £ H such 
that 


t:{u) = (£., u) = {u^v) 


for all u C if, Here the symbol (-. ■) denotes the pairing of //' and H and (f, u) is 
the real number f(u) . The mapping £ * v is a linear isomorphism of //' onto II- 
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Gateaux find Erechet Derivatives 

The local sensitivity analysis of Chapter 14 relies heavily on the concepts of 
Gateaux variations and differentials, which we summarize here. Additional details 
can be found in [50 ^ 204 . 


Definition A. 7 (Gateaux Variation, Differential, and Derivative)* For T : 

dom(T) 7 , consider x £ doin{T) and arbitrary ?j £ X . If the limit 

ST(x\i]) = lim - \ T(x I £?j) — T{jc) 

exists for each 77 € X. then 6 T(x\ rj) is termed the Gateaux variation of T at x with 
increment or perturbation r;. 

For functional* ,/, the Gateaux variation, when ii exist*, is 

5J{x\ t}) = T/(t + £^)| e=0 . 

Note that for each fixed x € dom(T), 6 J(x\ tj) is a functional with respect to 77 € X . 
We also note that is neither necessarily linear nor continuous with respect 

to r). so it may not map X to A' H . If S. l(x; 77) is linear and continuous with respect 
to r/, and hence fiJ(x:rf) = DJ(x)t}, then SJ : X X" is termed the Gateaux 
differential and : X —r R is the Gateaux derivative of ./ at x. Details 

regarding necessary and sufficient conditions to establish that S J(x; 77) is linear and 
continuous in 77 can be found in [ 50 ]. 

Example A. 8 . Let J : K 2 — * K by 

T/ T> _ f ^lt 1 + l/*t) > / 0 
J ' X} ~\ 0 , X 2 = 0 . 


1 1 ere 


lim - L/fx H- £n) 
£-m e 1 



so th at <5 J (2 ■ ; 77} is no t I i near w i th r esp ect to 77 . 

We note that the Gateaux variation generalizes the concept of the directional 
derivative from calculus and the first variation in the calculus of variations. Be- 
cause the Gateaux variation and derivative require no norm on A% they cannot 
be directly used to establish continuity. This stronger concepi i* provided by the 
Frechet derivative, which generalizes the calculus concept of differentiability. 


Definition A . 9 {Frechet Differential and Derivative). T is said to he FVIehet 
differentiable at x £ dom(T) in the normed space X if for each 77 £ X. there 
exists 5 T(x;tj) = DJ(x)rf € Y, which is linear, is continuous with respect to 77, and 


satisfies 


lim 
II v I ->o 


II T(x I /,) 


T(x) - JT(a:;?j)| 

M 
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When it exists, ST(x\ rf) is termed the Predict differential of T at x with increment rj 
and DJ(x) is termed the Fr^chet derivative Further relations between the Gateaux 
and Fiecliet derivatives can he found in [50]. 


Adjoint Operators 

The use of adjoint operators is central to the adjoint sensitivity analysis pro- 
cedure (ASAP) detailed in Chapter 14, We focus on Hilbert space adjoints and 
refer the reader to (50, 139 for the theory of adjoint operators defined on Banach 
spaces. The text- [145] is recommended tor additional historical perspective and 
development of adjoint operators for differential operators with various boundary 
conditions. We refer the reader to [1UT] for a description of adjoint operators in the 
context of design and sensitivity analysis for structural systems and [22 1 for theory 
and examples illustrating the role of adjoints in PDFs, parameter estimation, and 
distributed, control theory. Finally, the text 235 provides a very nice introduction 
to adjoint operators and their relation to Green’s functions- for different ial equations. 

Definition ADO (Hilbert Space Adjoint), Let II\ and II 2 he Hilbert spaces 
w j t f 1 asso< : i a ted i 1 1 ne 1 ■ pn h 1 acts ( - 1 ■ ) ^ | a nc 1 ■ . ■ ) , am 1 let L : H 1 — t //■> be a 
bounded linear operator. The Hilbert space adjoint operator L* of L is the bounded 
linear operator L : H\) H\ such that for all u € H\ and v € H?, 

- {u i L*v) Ili - 


The norm is || L x | — ||L . 

As detailed in [139 , die uniqueness of the adjoint operator results from 1 . 1 le 
Riesz representation theory, which establishes that the dual H* of a Hilbert space 
can be canonically identified with f! in the sense detailed in Remark A, 6. If L : 
H -4 H is a bounded linear operator, L is said to be self-adjoint if L = L \ 

Example A. 11, Lei A ; — > K m be a matrix , and consider the usual Euclidean 

dot product (x 1 in -- x* y. Since 

(Ax, y) = (Ac) 1 y = x’ A 7 y = (x, A T y ) , 

ii follows that A ’ = A , so the adjoint of a matrix is its transpose , Note dial 
the first dot product is an inner product for R rri , whereas the second is for [E£ Tt . 
Similar analysis holds for linear operators L : -4 R m since they can always be 

represented in terms of a. unique n x m matrix. 


The difficulty is that di fferential operators are unbounded and i he specification 
of the domain is critical. The following definition classifies adjoints for unbounded 


linear operators. 


Definition A .12. Lot H be a Hilbert space with inner product (-,■}■ and let L : 
dom(L) =4 // be an unbounded linear operator with dom(L-) dense in //. We define 
dom(L*) to be the set of elements v such that there exists w so Uiat {Lu, v) - ■ {u, w) 
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for all u E dom(L). The adjoint operator L* with domain doin(L*) is that which 
satisfies 

{Lu,v) = {u t L*v) (A. 3) 

for all u G doin(L) and v G dointL"). A densely defined operator L : P —¥ H is 
self-adjoint if L = L" and dom(L ' ) = dom(L), 

As will be illustrated in examples, the adjoint of a differential operator neces- 
sarily requires specification of initial or boundary conditions. This is in contrast to 
the formal adjoint^ which simply describes the coefficients of the differential opera- 
tor. 

To motivate the approach used to construct anoints for differential operators, 
we take // - }?{<>) so that 

(u. v) = / u{x)v(x)dx. 

The bilinear identity (A. 3) can then be interpreted as a generalised Green’s identity 
or successive integration by parts with adjoint boundary conditions chosen to ensure 
that boundary terms vanish. 

Since differential operators L are not defined for all Lr functions, the domains 
doni(L-) must be chosen as subsets of sufficiently smooth functions 1 hat satisfy initial 
or boundary conditions. Due to their completeness and approximation properties, 
subsets of i be Sobolev spaces provide a natural setting for anal- 

ysis since they are continuously and densely embedded in Lr (ft) and have trace 
properties that facilitate boundary analysis. Readers are referred to [6, 78, 265] for 
details regarding Sobolev spaces in the context of differential equations. 

Example A, 13. To ilh ibi i ate the consti uc tion ol ilii 1 rid i ant operatoi and bound 
ary conditions for a boundary value problem, consider 

Lu = u ff (x) — f(x) f 0 < x < 

= u { 0) — u(l) = U. 

= u'(l) = 0. 

r : 

The linear differential operator L is thus L — and B i and B 2 are boundary 
functionals that map sufficiently smooth functions u to numbers B\u and B 2 tt. 
Successive integration by parts yields 

/ u ,f vdx= I uv rt dx I P(u, t ; )lo^ 

A i Jo 

v. here ill 1 lnlir u : .i r I ■ : ■ : 1 1 V \ somethin ivh'rivr I 1 ■ as l he ( tuijiziici ■ t; sun I !'. is veil 
by 

v)|J = u'OKO - «'(0)w(0) - u(iy (1) + 11(0)0(0). 

,2 

The choice L* = • as the formal adjoint yields 


(Lu,r) - {u. l/ vj = P(u , tiJIjj 




350 


Appendix A. Concepts from Functional Analysis 


Til achieve the bilinear identity (A.d), adjoint boundary eondilioiis are chosen to 
ensure that * = 0. Since t/(0) — n{l) and vj/(l) = 0, 


P(u, t;)|J - -u(l)[t>(0) + »/(lJ| + u(ny (0). 

where </ ( 0 ) and u(l) are arbitrary. One choice for the adjoint boundary conditions 

is 

B*v = v f ( 0 ) = 0 , BZv = v(0) + v f {l) = 0 . 

We note that this choice is not unique, so we typically choose the minimal conditions 
required to eliminate boundary terms. 

Although L — L : \ the boundary conditions differ so that dom(L x ) / dom(L) t 
and hence the operator is not self-adjoint. This can be quantified by noting that 
appropriate choices for the domains are 

riom(L) = {« e H-( 0, i ) | u'(0) - «.(l ) = 0, u'( I ) = 0 } , 

:bm(L") - € // 2 (0, 1> | v' (0) - 0,i>(0) +v'(L) - 0}. 


Txaiiiplt^ A. 14- 


Since 


L 


u ( r) = , 0 < x < 1, 

ti(0) - i/(0) - 0. 

{u if v - uv f, )dx ii{ l)v(l) - u(l)v f (1) t 


appropriate adjoint conditions are u(l) - - — 0. For initial value problems, the 

adjoint equations will always have final time conditions, which is known to readers 
familiar with the adjoin! or eo-s!a!e relations arising in optimal control- This is 
further illustrated in Section 14.2.2* 


To motivate the construction of adj Gluts for partial differential operators, we 
consider the Laplacian A in Fl" which satisfies Green’s second identity 



Hence the Laplacian is formally self-adjoint, and adjoint conditions are chosen to 
ensure that boundary terms vanish. For an arbitrary linear differential operator L t 
the corresponding relation is 




P(n. v)dS, 


(A . I) 


where FfnvO is dependent on the form of the differential operator 
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E xa in p Its A . 1 5 ■ Co i isk lor til e cli Huston < i pt *rafco r 

■0 _ f _0=_ _ _£P_\ 

for x E fl K C A] id t E [G . tf The space-time domain is [1 — ft T x [G .fj\ with 
boundary dft. Green's formula yields 

/ (vLv. — uL' v}dxdt = / n » (etuv -\- uV x v — vV x ;u)dS, (A ,5) 

Jn J&u 

where the formal adjoint operator is 

_ o__ fjp_ o 2 . o 2 \ 

Ot \ dx^ Bx\ t?i’3 / 

e t is a unit vector in time, and Vj; indicates the gradient with respect to spatial 
variables. The bilinear form B is thus 


F{u, v) = e t nv H- uV^v - vV.j-u, 


As detailed in [235], 1 he tact, that ft is a cylinder in space-time can be exploited to 
simplify (A. 5) to 



uL m v)dx 



ur 


0 


dx + 




dn \ 
L dn x ) 



A.l Exercises 

E xer c ise A.l. Consi < 1 cr the di I lerent ial or pi; \ 1 ion 

(a 2 (x)v.’y I a D (a)u = f(x) 

so that L = 37 (<*2 nr) + «o = DlazD) + ao, "where D = 4 ^ in M 1 , Take the boundary 

functionals to bo 

Din = cli i u (a ) T a 1 2 u ( a ) , BB u = 3 a i j it ( &) -f 3 22 » ( & ) - 
Show that L is self-adjoint. 

Exercise A. 2. Consider the fourth-order operator 

L = D'\a 2 l)-) + £(oi/J) + do, 

where a^(x)^ai(x)^ and az (jE) are arbitrary functions, and define the boundary 

operators to lie 

B\u = it (a) „ Bzii — id ; (a) „ B's u = i*{fj) , S4W = u^(&). 

Note that these are typically termed pinned boundary conditions. Determine the 

adjoint operator L and adjoint boundary conditions. Is /. self-adjoint? 
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